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1.  Introduction 


Regression  diagnostics  are  becoming  a  well-accepted  tool  in  the  practice  of  statistics. 
This  is  evidenced  not  only  by  books  devoted  to  the  subject  (e.g.,  Belsley,  Kuh  and  Welsch, 
1980;  Cook  and  Weisberg,  1982;  Atkinson,  1985),  but  also  by  the  penetration  of  the  concepts 
into  standard  texts  on  regression  (e.g.,  Weisberg,  1980)  and  the  increasingly  widespread 
availability  of  software  for  computing  the  diagnostics.  One  also  sees  the  basic  leave-one-out 
diagnostic  idea  for  linear  regression  begin  carried  over  to  somewhat  more  complicated 
settings  such  as  logistic  regression  (Pregibon,  1981)  and  Cox  regression  (Storer  and  Crowley, 
1985). 

However,  the  literature  appears  to  be  relatively  devoid  of  analogous  results  in  the  time- 
series  setting,  in  spite  of  a  rather  obvious  way  to  obtain  leave-one-out  diagnostics  in  the 
context  of  ARIMA  model  fitting  for  time  series,  at  least  in  principle:  One  deletes  a  single 
observation  at  a  time,  and  for  each  deletion  computes  a  Gaussian  maximum  likelihood 
estimate  for  missing  data  (see,  for  example,  Jones,  1980;  Harvey  and  Pierse,  1984;  Kohn  and 
Ansley,  1986).  It  should  be  noted  that  use  of  Gaussian  MLE’s  for  missing  data  entails 
intuitively  appealing  use  of  predictions  in  place  of  missing  data.  A  diagnostic  display  is 
obtained  by  comparing  the  leave-one-out  MLE’s  with  the  Gaussian  MLE’s  for  the  full  data 
set  versus  time,  on  an  appropriate  comparison  scale.  This  idea  was  articulated  some  time 
ago  by  Brillinger  (1966),  but  only  the  advent  of  powerful  computers  and  algorithms  for 
fitting  ARMA  and  ARIMA  models  with  missing  data  has  placed  actual  use  of  the  procedure 
within  reach. 

In  this  paper  we  demonstrate  the  efficacy  of  observation  deletion  diagnostics  for  time 
series,  addressing  in  the  process  some  issues  which  are  special  to  the  time  series  setting.  In 
particular,  we  consider  not  only  diagnostics  based  on  ARIMA  model  coefficients,  but  also 
diagnostics  based  on  the  innovations  variance.  We  show  that  the  time  series  problem  gives 
nse  to  a  “smearing”  effect  which  is  not  encountered  in  the  usual  independent-observation 
setting.  For  diagnostics  based  on  coefficients,  this  smearing  can  result  in  considerable 
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ambiguity  concerning  the  numbers  and  locations  of  outliers.  By  both  examples  and  by 
theoretical  calculations,  we  show  that  diagnostics  based  on  the  innovations  variance  is  far 
superior  to  coefficient-based  diagnostics  in  this  regard. 

Furthermore  outliers  frequently  occur  in  patches  in  the  time  series  setting.  Thus  we 
proposed  a  “leave-k-out”  diagnostic  approach  which  is  both  effective  and  within  computa¬ 
tional  reach. 

The  paper  is  organized  as  follows.  Gaussian  maximum  likelihood  estimation  of 
ARIMA  models  with  missing  data  is  reviewed  in  Section  2.  Section  3  presents  the  basic 
“leave-/!: -out”  diagnostic,  based  on  the  coefficients  and  innovations  variance,  including  a 
proposal  for  scaling.  Some  artificial  examples  are  given  which  illustrate  that  the  innovations 
variance  is  a  better  diagnostic  tool.  Analytical  results  on  “smearing”  effects  associated  with 
leave-k-out  diagnostics  are  presented  in  Section  4.  The  problem  of  outlier  type  identification 
is  discussed  briefly  in  Section  5.  Section  6  presents  an  iterative  deletion  procedure  to  over¬ 
come  problems  caused  by  masking.  Techniques  are  also  discussed  for  handling  other  types 
of  disturbances,  such  as  level  shifts  and  variance  changes.  Finally,  we  give  an  overall  stra¬ 
tegy  for  ARIMA  model  identification  and  fitting  using  the  leave-k-out  diagnostics.  This  stra¬ 
tegy  is  applied  in  Section  7  to  two  real  data  sets.  Finally,  possible  extensions,  computational 
aspects,  scaling  issues,  and  a  connection  with  robust  filtering  are  briefly  mentioned  in  Sec¬ 
tion  8. 


2.  Estimation  of  ARIMA  Models  with  Missing  Data 

Exact  maximum  likelihood  estimates  with  missing  data  can  be  obtained  using  the  state 
space  representation  of  an  ARIMA  model.  Various  formulations  have  been  given  by  Jones 
(1980),  Harvey  and  Pierse  (1984)  and  Kohn  and  Ansley  (1986).  The  Harvey  and  Pierse 
approach  is  used  here.  The  Kohn  and  Ansley  approach  has  an  attractive  feature  which  we 
comment  on  in  Section  8. 


2.1  The  Model 

Consider  a  nonstationary  process  xt,  t-  1, . . .  ,n,  which  can  be  represented  by  an 
ARIMA  (p  ,d,q)x(P  ,D  ,Q)  model 

<1 >(Bs)^(B)VdVj>  xt  =  y+B(Bs)9(B)et  (2.1) 

where  the  £f ,  are  the  innovations.  These  are  assumed  to  be  independent  normal  random 
variables  with  zero  mean  and  variance  o  2  .  B  is  the  backshift  operator,  and  the  regular  and 
seasonal  difference  operators  are  V  =  (1-5 ),  V,=(l -B1),  respectively.  The  intercept 
term  is  y>  the  ordinary  autoregressive  and  moving  average  operators  are 

4>(fl)  =  l-4>i  B - <t >pBP,  9(B)  =  1-0!* - (2.2) 

and  the  corresponding  “seasonal”  operators  are 

®(BS)  =  l-OjB* - -<t>pBsP,  S(BS)  =  1  -Q{BS - QqB*2  .  (2.2’) 

Let  a  denote  the  r  x  1  vector  of  parameters, 


=  (4>i,  ’  ‘ '  ,4>b  .^i* 


,9j  ,©i 


)' 


where  r=p+P+p+2,  Assume  that  the  polynomials  in  (2.2)-(2.2’)  have  their  roots  out¬ 
side  the  unit  circle,  so  that  the  process  w,  =  Vd  xt  is  stationary  and  invertible. 
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2.2  Kalman  Filter  Representation  of  the  Likelihood  Function 


The  state  space  formulation  of  (2.1)  is  based  on  the  vector  Markov  state  transition  equa- 


x,  =  Tx,_!  +  re, 


where  x,  is  an  mxl  state  vector,  T  is  an  mxm  transition  matrix,  r  is  an  mxl  vector, 


the  e,  are  as  in  (2.1),  and  m  =  max(p  +sP  +d+sD,q  +sQ  +1).  The  values  of  the  pro¬ 


cess  x ,  are  related  to  the  state  vector  x,  via  the  noise-free  observations  equation 


x.  =  zx, 


where  zT  =  ( 1,0,  ■  •  •  ,0) 


Let  xt  denote  the  optimal  linear  (Kalman)  filter  estimate  of  xf  (i.e.,  xf  minimizes  the 


mean-squared  error  and  depends  on  the  data  xx,  •  •  ■  ,xt ),  and  let  a2 Pt  be  the  covariance 


matrix  of  filtering  error  xf-xf.  Also,  let  xt|f_i  =Tx,_!  be  the  optimal  one-step-ahead 


predictor  of  x, ,  and  let  o2  Pf|,_i  denote  the  corresponding  prediction  error  covariance 


matrix.  The  Kalman  filter  provides  a  well-known  method  for  recursively  evaluating  xf , 


*r|r-l  >  Pf  >  • 


—  +ft  lP»|r-lz'er 


P.I.-1  -  TP,.,T'  +  rr'o2 


Pf  —  P(it-i  ft  P.' , — i  ^"P,  i ,  —  i 


where  e,  is  the  observation-prediction  residual 


et  a  xt-E[xt\xx . jq.J  =  jq-z'x,,,^ 


and  ft  is  the  associated  observation-prediction  error  variance 


ft  ~  E[et  |x  j,  .  .  .  , —  zT>,|f_jZ  —  (Pf|r-i)n  • 
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The  choice  of  initial  condition  is  discussed  shortly. 

The  log-likelihood  is  conveniently  expressed  in  terms  of  et  and  f,  (see,  for  example, 
Harvey  (1981)): 

log  L  ( x„ ,  a,  o  2 )  =  --^log2ji-ylog o2-{il°g/(  (2.8) 

2  2  2r=i  2oz  ,=i 

where  x„  =  (x  t,  •  •  ■  ,xn  )T .  Note  that  /,  =/,  (a)  and  et  =  et  (a) . 

Concentrating  out  a2  in  (2.8),  the  maximum  likelihood  estimate  a  is  given  by 

a  =  argmin  {  £  log/,  (a)  +  log  (  £  et  2(a)//f  (a) ) }  .  (2.9) 

i=i  t= l 

If  observation  f0  's  missing,  then  the  corresponding  term  in  (2.9)  is  dropped,  no  update  is 
performed  in  the  Kalman  filter,  and  x r<>=  i  r0|r0— 1  • 

When  a  nonstationary  ARIMA  process  is  differenced  to  produce  stationanty,  the  log- 
likelihood  is  given  by  (2.8)  applied  to  the  differenced  series  w, »  V  d  Vf*  x  t . 


2.3  The  Special  Case  of  Stationarity 

Consider  the  special  stationary  ARMA  process  case  of  (2.1).  Ignoring  seasonal  com¬ 
ponents,  we  use  the  state  space  formulation  preferred  by  Harvey  and  Phillips  (1979),  among 
others,  and  choose  T  and  r  to  be 


r'  =  0,-0! . ~Qm-X)T 


(2.10) 


where  m=max(p,q  +  1),  $,  =  0  for  i>p  and  9,  =  0  for  i>q  +  1.  Under  stationar.rv 


the  initial  conditions  for  (2.6)  are  x1|0=0  and  P[  j0  which  satisfies 


Piio  =  TP  1 1  qT  +  rr  (2.11) 

For  numerical  solution  of  (2.1 1),  see  Gardner,  Harvey  and  Phillips  (1980). 

To  ensure  stationarity,  the  optimization  of  (2.9)  over  <>1 ,  •  •  •  ,§p  must  be  constrained 


so  that  the  roots  of  the  polynomial  equation  $(fi )  =  0  lie  outside  the  unit  circle.  This  is 
easily  done  by  first  reparameterizing  in  terms  of  the  partial  autoregressive  coefficients. 
bn  j=1,  .  .  .  ,p  ,  and  carrying  out  an  unconstrained  minimization  over  the  transformed  par¬ 
tial  autogression  coefficients  u, ,  i  =  1 . p  ,  where 


bi  = 


1-e 


1+e 


(2.12) 


The  parameters  <j>1(  •  •  •  ,<pp  are  obtained  by  the  Levinson  (1947)-Durbin  (1960)  recur¬ 
sions.  See  Jones  (1980)  who  also  pointed  out  that  invertibility  of  the  estimated  model  can  be 
assured  by  using  partial  moving  average  coefficients. 


2.4  Nonstationary  Models 

Now  consider  the  general  ARIMA  model  without  seasonal  terms.  One  possible 
approach  is  to  apply  the  state  space  model  (2.10)  to  the  differenced  series  w, .  However, 
with  missing  observations,  this  procedure  is  undesirable  since  the  scries  w,  will  have 
(d+l)(D+l)  times  as  many  missing  values  as  xt .  An  alternative  method  due  to  Har/ey  and 
Pierse  (1984)  avoids  this  difficulty  by  utilizing  a  levels  formulation  of  the  state  space  model 

The  levels  formulation  is  based  on  separating  the  stationary  and  nonstationary  pans  o: 
x,  in  the  state  vector  x, .  Let  d0=d+sD  ,  m^w)=  max(p,q+\),  and  m  =  m(-w)  +  d0.  The 
state  vector  is  xr  =  ( x/w),,xr°_1')  where  x,('*)  is  a  m(w|xl  state  vector  for  h,  and 
x?_i'=  (xf_j,  .  .  .  ,xt_d_sD )  is  a  do*  l  vector  of  past  observations  he: 


S'  =  (51( .  .  .  ,5io),  where  -5 ;  are  the  coefficients  of  the  polynomial  ,  so  that 

do 

V  d  V,  =1-2  8;5 1 .  Then  the  new  transition  equation  is  given  by  (2.4)  with 
;=i 


Jml"Kd0 


0,0-i 


(2.13) 


r'  =  (r(w)/,<L') 


where  Zt(w),=  ( 1, ) ,  T*w)  and  r(w)  are  the  same  as  in  (2.10).  The  new 
measurement  equation,  given  by  (2.5)  with  Z't  =  ( Z,(w)',  8') ,  is  essentially  an 
undifferencing  operator. 

The  Kalman  filter  is  initialized  at  time  t0=d0  with  ^,,o+I|d0=  (0m',x'°B)  and 


1  do 


Piio  0 

0  0 


(2.14) 


v  ^  j 

where  P^q  is  the  solution  to  (2.11)  with  T  and  R  replaced  by  T(w)  and  R(vv) 
respectively.  With  no  missing  data,  the  likelihood  computed  from  the  observations  x,  is 
identical  to  that  computed  from  the  differenced  observations  wt .  The  likelihood  is 
maximized  as  before,  using  partial  autoregressive  and  moving  average  coefficients  to  ensure 
stationarity  and  invertibility  of  w, . 

Note  that  this  approach  requires  d0  consecutive  observations  at  the  beginning  of  the 
senes.  If  a  missing  value  occurs  in  the  beginning,  then  the  likelihood  can  be  computed  by 
reversing  the  series  (i.e.,  ordering  the  data  by  xn,xn  _  t,  •  •  •  ,x  i )  and  applying  the  Kalman 
filter  to  the  reversed  series.  If  there  are  missing  values  at  both  ends  of  the  series,  this 
approach  will  not  work.  In  this  case,  the  formulation  proposed  by  Kohn  and  Ansley  (1986) 


can  be  applied:  see  Section  8  for  further  discussion. 

2.5  Seasonal  Models 

The  extension  to  the  general  seasonal  ARIMA  model  given  by  (2.1)  follows  from  ex¬ 
pansion  of  the  autoregressive  and  moving  average  ordinary  and  seasonal  operators. 

In  the  stationary  case,  the  state  vector  xf  has  length  m  =  max  {p  +  sP ,  q  +  sQ+  1) , 
and  the  parameters  <j>lt  ■  ■  •  ,4>p  and  0lt  are  replaced  by  the  appropriate 

coefficients  of  <j>(B)<X>(B)  and  0(B) ©(B).  Stationarity  and  invertibility  arc  assured  by 
transforming  each  of  the  ordinary  and  seasonal  parts. 

The  nonstationary  case  is  extended  to  seasonal  models  in  the  same  fashion. 


~ “wP  'T* 


»’¥  W  —  ^  • 
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3.  Leave-k-Out  Diagnostics  for  ARIMA  Models 

In  this  section,  we  describe  our  basic  leave-k-out  diagnostics  for  both  ARIMA  model 
coefficient  estimates,  and  the  innovations  variance  estimate.  As  we  shall  see  by  example  in 
several  simulated  series,  the  diagnostics  based  on  the  innovations  variance  yield  much 
sharper  results  than  those  for  the  coefficients. 


3.1  The  Basic  Leave-k-out  Diagnostics 
Diagnostics  for  Coefficients 

Denote  the  maximum  likelihood  estimate  (MLE)  of  a  by  a.  Let 

A={fi,r2,...,fk}  be  an  arbitrary  subset  of  {1,2 . «},  and  let  aA  denote  the 

MLE  with  observations  yt  ,  ' '  ,y,t  treated  as  missing.  If  some  of  the  observations  in  A 
have  an  undue  influence  on  the  estimate  aA  ,  then  this  will  often  reveal  itself  in  the  form  of 
a  substantial  difference  between  a  and  aA .  We  define  the  empirical  influence  on  the 
coefficients  of  the  subset  A  by 

EI(A)  =  -  n  (dA  -  a) .  (3.1) 

Standardizing  by  the  factor  n  leads  to  a  non-degenerate  asymptotic  form  for  (3.1)  (see 
Appendix  B). 

In  the  case  of  independent  observations,  one  almost  always  deletes  a  single  observation 
at  a  time  and  computes  various  diagnostic  statistics.  However,  the  time  senes  situation 
differs  from  the  case  of  independent  observations  in  at  least  two  important  wavs 
(a)  structure  is  imposed  by  time  ordering,  and  (b)  influential  observations  often  come  in  the 
form  of  an  "outlier  patch"  or  other  local  "structural"  change  extending  over  several  observa¬ 
tions.  Leave-one-out  diagnostics  can  fail  to  give  clear  evidence  of  influence  in  the  case  ■  : 
patchy  disturbances  such  as  outliers  (an  example  of  this  is  provided  below).  Such  behav  - 
might  be  regarded  as  a  form  of  "masking"  since  the  effect  of  any  single  outlier  in  v.. 


k' 


'vVvVV 


„*  V  s 


mmw  *>*  *v'*n*'*\/"*  *■**  ^ 9  •*.  a  .■ ,  •  ■*„  ,*4  .*,  *«,*  v  ■  v 
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patch  can  be  overwhelmed  by  the  effect  of  the  other  outliers.  Fortunately,  this  kind  of  situa¬ 
tion  is  easily  dealt  with  in  time  series  (unlike  as  in  unstructured  independent  observation 
problems)  by  leaving  out  k  consecutive  observations;  that  is,  by  taking  A  =  Ak  t  to  consist 
of  the  k  time  points  centered  at  r :  (r  -  [ — - — ],  •  •  • ,  t  +  [  — ])  ,  where  [ x  ]  denotes  the 

largest  integer  less  that  or  equal  to  x  .  To  simplify  notation,  denote  EI(At  f)  by  EI(£,r) 
and  dAt  i  by  dk  ,  . 

For  patches  at  the  ends  of  the  series,  where  t  £  [  - ~ — j  or  t  >  n  -[  y] ,  EI(£,  t) 

is  computed  with  the  patch  truncated  in  the  obvious  manner.  For  nonstationary  models,  the 
series  will  be  reversed  to  obtain  EI(4,  r)  for  r  =  1,  •  •  •  ,ri0+[  1  ]  where  d0  is  the 

order  of  the  differencing  (see  the  comments  under  nonstationary  models  in  Section  2). 

A  strategy  for  determining  the  largest  k  that  needs  to  be  considered  for  a  given  data 
set  will  emerge,  based  on  the  empirical  examples  of  Section  3.2. 

Standardizing  El 

The  empirical  influence  EI(  k ,  t )  is  an  r  -dimensional  vector,  and  as  such  is  difficult  to 
interpret  Further,  the  empirical  influence  is  relative,  and  comparable  only  within  a  data  set. 
Hence  it  is  useful,  as  in  the  ordinary  regression  context,  to  consider  a  quadratic  form  diag¬ 
nostic  measure  of  influence  for  coefficients 

DC(k,t)  =  EV(k,t)  MEI(k,  r)  (3.2) 

where  M  is  an  appropriate  positive  semi-definite  rxr  matrix.  As  in  the  regression  setting, 

A 

it  is  natural  to  choose  M  to  be  the  inverse  of  covariance  matrix  of  a . 

Although  the  exact  covariance  matrix  for  d  is  not  known,  it  can  be  approximated  by 
the  asymptotic  information  matrix  1(a).  It  is  well  known  that  a  is  asymptotically  normal 
under  regularity  conditions  (see,  for  example,  Fuller,  1976); 

'fn  (a-a)  ->  Nr( 0, 1(a)-1).  (-  - 


If  1(a)  is  a  consistent  estimator  of  1(a) ,  then  the  Mann-Wald  theorem  implies  that 

n(a-a)' i(a)(a-a)  ->  x2  (3.4) 

where  x 2  denotes  a  chi-square  random  variable  with  r  degrees  of  freedom.  Thus,  it  is 
natural  to  choose  M  to  be  n~ 1  i(a) . 

One  estimator  of  1(a)  is  1(a) ,  the  expected  information  evaluated  at  the  maximum 
likelihood  estimate.  Although  not  commonly  available  in  the  literature,  a  closed  form 
expression  for  1(a)  in  terms  of  a  exists  (this  expression  is  derived  in  Appendix  A).  Using 
this  expression,  we  take  as  our  leave-k-out  diagnostic  for  coefficients 

DC(k,t)  =  —  EI'(A: ,  t )  I(a)EI(  k,t)  (3.5) 

n 

=  n(a-ak  l)' I(d)(d-dk>1) . 

Although  the  distribution  of  DC(k,t)  is  not  known,  the  use  of  the  Xr  distribution 
allows  one  to  view  DC ( k,t )  on  a  familiar  scale.  This  corresponds  to  using  the  F  distri¬ 
bution  as  a  reference  for  Cook’s  Distance  (Cook  and  Weisberg,  1982)  and  DFFTTS  (Belsley 
et  al.,  1980).  The  x?  distribution  is  used  in  the  time  series  case,  rather  than  an  F  distri¬ 
bution,  since  1(a)  does  not  involve  the  nuisance  parameter  o2. 

Following  previous  applications  of  leave-k-out  diagnostics,  we  recommend  judging  a 
point  or  patch  of  points  to  be  influential  if  the  "p -value"  of  DC(k,t)  based  on  the  Xr 
reference  distribution  is  smaller  than  .5.  Empirical  evidence  shows  this  guideline  is  quite 
useful,  except  near  the  region  of  noninvertibility  or  nonstationarity. 


Diagnostics  for  the  Innovations  Variance 


The  influence  of  a  subset  A  can  also  be  measured  by  evaluating  the  effect  of  its  removal 
on  the  MLE  of  the  innovations  variance  estimate  6 2 .  The  innovations  variance  is  a 
nuisance  parameter,  and  at  first  thought  it  might  therefore  appear  to  have  less  intuitive  appeal 
as  a  basis  for  leave-k-out  diagnostics  than  the  coefficient  estimates  a .  However,  it  turns  out 
that  a  diagnostic  based  on  the  innovations  variance  leads  to  a  more  effective  tool  than  DC 
for  identifying  outliers.  The  diagnostic  is  formed  in  the  same  manner  as  above:  A 
standardized  version  of  -(akt-a  )  is  computed,  where  okt  is  the  MLE  of  o  with 
observations  at  times  t  e  A,  k  treated  as  missing. 

Again,  the  standardization  is  based  on  asymptotic  theory.  Under  regularity  conditions, 
o  2  is  asymptotically  independent  of  a ,  and 

^fn  (02-02)  N  (0,2<J4)  .  (3.6) 


If  6 2  is  a  consistent  estimate  of  ,  then  by  the  Mann-Wald  theorem 


n_ 

2 


ii 

d2 


- 1 


Xi 


Thus,  we  propose  to  use  as  leave-k-out  diagnostic  for  innovations  variance 


DV(k,t )  =  y 


«*!» 


-i 


(3.7) 


with  the  reference  distribution  being  a  chi-squared  with  one  degree  of  freedom  (Xi2) 
Again,  one  suspects  an  observation  y,  to  be  influential  if  the  p  -value  for  DV (k,t)  is  less 
than  .  5  using  a  • 
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Relationship  Between  DV  and  Fox  Tests  for  AO 

The  difference  n  [or2- c2j]  is  asymptotically  equivalent  to  the  squared  interpolated 

residual  (x,  -x/*)2  where  x"  =  E(xt  \xt ,  t  -  1 . n  ,  l  *t).  The  interpolated  residual  was 

used  by  Fox  (1972)  as  a  basis  for  testing  the  presence  of  a  (parametric)  AO  type  outlier  at  a 
fixed  time  t.  At  first  glance  this  equivalence  may  seem  surprising  since  (o2-^2^)  is  based 
on  the  prediction  residuals  et  =  xt  -x,  and  not  on  the  interpolation  residuals  x,  -x*.  But,  by 
using  the  smoothing  form  of  the  likelihood  (Schweppe,  1973),  it  is  easy  to  show  the  claimed 
asymptotic  equivalence. 

It  is  important  to  note  that  in  spite  of  the  asymptotic  equivalence,  the  finite  sample 
differences  can  be  significant,  e.g.,  a2  -  cjj  can  be  negative,  whereas  (x,  -x/*)2  cannot 
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3.2  Outlier  Models  and  Examples 


Outlier  Models 

In  the  following  examples,  we  focus  on  influential  points  caused  by  outliers.  Influential 
observations  may  also  be  the  result  of  structural  changes,  such  as  level  shifts  or  variance 
changes.  We  shall  discuss  application  of  leave-k-out  diagnostics  to  such  problems  in 
Sections  7  and  8. 

We  examine  the  performance  of  DC  and  DV  under  two  types  of  contamination 
commonly  used  in  other  studies  (see,  for  example,  Fox,  1972;  Denby  and  Martin,  1979; 
Martin  and  Yohai,  1986;  Tsay,  1986):  the  additive  outliers  (AO)  model  and  the  innovations 
outliers  (10)  model  The  primary  focus  will  be  on  AO  models. 

Let  xt  be  a  Gaussian  ARIMA  process  specified  by  (2.1).  Then  y,  behaves  according 
to  a  constant  magnitude  AO  model  if 

yt  =  xt  +  Cz,  (3.8) 

where  £  is  constant  and  z,  is  a  fixed  0-1  process.  The  magnitude  of  the  outliers  is  £ ; 
isolated  outliers  and  patches  are  created  by  appropriate  choice  of  0’s  and  1  ’s  for  z, . 

A  constant  magnitude  10  model  is  formed  through  contamination  in  the  innovations 
process  e, .  Let  e,  be  a  contaminated  white  noise  process,  with 

Er  =  y,  +  (3.9) 

where  yt  are  independent  Gaussian  random  variables  with  zero  mean  and  variance  a  2  ,  and 
^ ,  z,  are  as  above.  Then  yt  follows  an  ARIMA  IO  model  if  it  is  generated  by  (2. 1 )  with 
the  e,  given  by  (3.9). 

The  models  (3.8)  and  (3.9)  are  actually  somewhat  special  AO  and  IO  forms.  More 
general  AO  and  IO  (and  other)  outlier  models  for  time  series  are  possible  (see  Martin  and 
Yohai,  1986,  for  a  very  flexible  "general  replacement"  model). 


'  •  •  «  *•  v  *,  *.  • 
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Example  3 .1 :  AR(1),  <J»  =  .4 ,  a2  =  1  ,  AO  model  with  1  isolated  outlier 

Starting  with  a  simple  case,  we  examine  a  simulated  AR(1)  senes  of  100  points  from 

Gaussian  white  noise  with  0  =  .4  and  a  single  additive  outlier  of  +4  at  point  28.  The  MLE 

fit  of  an  AR(1)  model  with  the  entire  series  yielded  <p  =  0. 1 7  and  c2=  1.09.  The  data  is 

plotted  in  Figure  la  and  the  outlier  is  marked  by  "o”.  The  leave-l-out  diagnostics,  DC  (1,  ) 

*  -*  1 

and  DV(1,  ■ ) .  for  o  and  a  are  displayed  in  Figures  lb  and  1c.  The  p -values  correspond¬ 
ing  to  a  X\  distribution  are  displayed  on  the  right  axis.  The  p  -values  for  DC  ( 1 ,  r )  at 
t  =  27, 28, 29  are  all  smaller  than  .5,  while  the p  -value  for  DC  (1,  t)  is  considerably  greater 
tha  .5  for  all  other  times.  Thus  y  27 ,  y  ag ,  and  y  29  are  judged  to  be  influential.  By  contrast, 
only  £>V(1, 28)  is  significant  and  has  the  much  smaller  p  -value  of  about  .02.  This  example 
is  indicative  of  a  general  pattern  which  we  establish  analytically  in  Section  4:  an  outlier  is 
smeared  across  several  values  of  DC  (1,  • )  ,  but  is  identified  exactly  by  DV  (1,  • ) .  In  partic¬ 
ular,  the  smearing  for  DC  (1,  • )  extends  by  one  time  unit  in  each  direction  from  t  =28. 

Example  3.2:  AR(  1),  4>  =  .4 ,  o2  =  1 ,  IO  model  with  1  isolated  outlier 

This  is  the  same  series  as  in  Example  3.1,  except  that  the  outlier  at  point  28  is  of  the  IO 
type.  The  MLE’s  of  the  parameters  are:  $  =  .27  and  o2=  1.06.  Figures  2a,  2b,  and  2c 
display  the  data  and  leave-l-out  diagnostics  DC (l,  )  and  DV(  1,  ).  The  problem  of 
smearing  for  DC  is  considerably  worse  than  in  Example  1.  The  diagnostic  is  significant 
only  at  time  r= 27 ,  while  the  p -value  of  about  .7  at  r  =  28  is  quite  insignificant.  Howeve-. 
there  is  no  smearing  with  DV(1,):  DV(  1,28)  is  several  magnitudes  larger  than  DV(  1 ,  r ; 
for  any  other  t ,  and  hence  DV  identifies  the  outlier.  Again,  this  is  a  general  behavior,  esta¬ 
blished  in  Section  4:  DC  is  large  at  time  points  just  prior  to  the  occurrence  of  an  isolated 
IO  type  outlier,  but  is  small  at  the  time  of  occurrence  of  the  outlier,  while  DV  is  large  or.: . 
at  the  time  of  occurrence  of  the  outlier. 


-1 


c)  Scaled  Leave-1 -Out  Diagnostics:  Innovations  Variance 
Example  3.1 :  Simulated  AR(.4)  With  1  Isolated  Additive  Outlier 

Figure  1 
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In  all  examples  to  follow,  plots  of  DC  are  omitted  for  simplicity.  However,  in  Sec¬ 
tion  5,  we  discuss  the  use  of  DC  as  a  tool  to  determine  whether  an  outlier  is  AO  or  IO. 

Example  3.3:  MA(1),  0  =  -  .5  ,  (7  =  1 ,  AO  model  with  1  patch  and  1  isolated  outlier 

This  example  is  a  simulated  MA(1)  series  with  0=  -  .5  and  both  a  patch  of  three 
outliers  of  size  +4  at  points  60-62  and  an  isolated  outlier  of  size  -4  at  time  15.  The  outliers 
are  all  of  the  AO  type.  Figure  3a  shows  the  data;  the  MLE’s  are  6  =  -.  36  and  a2  =  1.86. 
Leave- 1 -out  through  leave -4-out  diagnostics  for  DV  are  displayed  in  Figures  3b-3e. 

Recall  that  for  k  S  2 ,  DV(k,t )  represents  the  influence  of  a  patch  of  k  observations 
centered  at  t .  For  even  k  ,  t  is  the  closest  point  to  the  left  of  the  "center”  of  the  patch.  For 
example,  with  k-2,  DV(k,t)  corresponds  to  the  diagnostic  computed  when  ye  and 
y,  + 1  are  left  out 

Although,  leave- 1 -out  diagnostics  clearly  identify  the  isolated  outlier,  there  is  just 
barely  an  indication  (use  the  p  -value  of  .5  as  a  guideline)  of  something  going  on  at  t  =  60 
and  62.  Leave- 1 -out  is  not  adequate  for  detecting  the  patch  of  outliers.  Leaving  a  single 
point  out  in  the  patch  is  insufficient  because  the  remaining  outliers  in  the  patch  comprise  the 
bulk  of  the  influence  of  the  patch.  Lcave-2-out  and  leave-3-out  provide  progressively 
stronger  evidence  of  the  patch  of  outliers.  The  value  Z^V  (3,61),  is  over  five  times  larger 
than  other  ncighborir  g  diagnostic  values. 

3.3  Patch  Length  Determination  Strategy 

Note  that  the  isolated  outlier  in  Figure  3  is  smeared  in  the  leave-k-out  diagnostics  for 
k  =2,3,4.  For  k=2,  both  DV(2,  14)  and  DV  ( 2,  15)  are  highly  significant,  and  have 
nearly  the  same  value  as  DV (1,  15) .  Similar  behavior  is  observed  for  k  =  3  and  k  =  4  The 
general  pattern  is  as  follows:  k- 1  values  of  DV (k  ,  )  surrounding  the  location  of  an  isolated 
outlier  at  r0  are  significant,  and  have  nearly  the  save  value  as  DV (k  ,r0) !  This  correspordN 


tismatrt  tn»a  -  -0.36 
•somaiaa  sigma  .  1 .86 


a)  Plot  of  Data 


b)  Scaled  Leave-1  -Out  Diagnostics:  Innovations  Variance 


c)  Scaled  Leave-2-Out  Diagnostics:  Innovations  Variance 
Example  3.3:  Simulated  MA(-,5)  With  1  Patch  and  1  Isolated  Additive  O 


to  what  one  might  intuitively  expect  for  an  isolated  outlier:  deletion  of  a  patch  which 
includes  an  isolated  outlier  has  nearly  the  same  affect  as  deleting  only  the  isolated  outlier 

Similar  behavior  occurs  for  a  patch  of  outliers.  For  example,  DV  ( 4,  r)  yields  values 
at  r  =  60  and  t  =  61  which  are  nearly  equal  to  DV  ( 3,  61).  In  general,  for  a  patch  of  k0 
outliers  centered  at  r0  ,  the  following  property  holds: 

•  For  k  2.  k0,  there  are  k  -k0+  l  subsets  AkJ  which  completely  overlap  the  patch, 
and  for  deletion  of  these  subsets,  the  magnitude  of  DV(k,t )  is  roughly  the  same 
and  significant  (i.e.,  the  associated  p-value  is  less  than  .5). 

Thus,  we  judge  an  influential  patch  to  be  of  length  k 1  centered  at  r0  if  DV(/t0,r0)  is 
significant,  and  the  above  property  holds.  If  DV(k0,t  0)  is  significant  and  the  above  pro¬ 
perty  fails  to  hold,  then  this  is  an  indication  that  a  broader  patch  of  outliers  is  present 

This  provides  us  with  an  initial  strategy  for  identifying  patches  of  influential  points: 
Compute  leave-k-out  diagnostics  for  increasing  k  =  1,2,...,  until  the  magnitude  of 
DV(k,t  )  does  not  "significantly"  increase  for  any  t .  The  length  of  a  patch  will  be 
estimated  as  one  less  than  the  first  value  of  k  for  which  nearly  uniform  smearing  is  in  evi¬ 
dence.  We  shall  improve  this  strategy  by  incorporating  iterative  deletion  in  Section  6. 

We  close  this  section  with  a  caveat:  Diagnostics  in  MA  models  often  exhibit  different 
characteristics  than  in  AR  models.  In  particular,  the  smearing  for  DC  is  slightly  worse  than 
the  AR(1)  case.  Also,  MA  models  are  susceptible  to  “start-up”  effects:  outliers  at  the  ends 
of  a  series  are  subject  to  more  smearing.  These  features  correspond  to  the  fact  that  MA 
models  have  infinite  autoregressive  representations. 


4.  Smearing  and  the  Expected  Diagnostic 


The  examples  of  Section  3  have  revealed  a  major  difference  between  leave-k-out 
coefficients  diagnostics  for  time  series  (including  k  =  1 )  and  the  usual  regression 
coefficients  diagnostics  for  independent  data.  Namely,  there  is  a  smearing  of  the  effect  of  an 
isolated  outlier  to  adjacent  points.  A  given  point  may  be  judged  influential  because  of  an 
outlier  at  an  adjacent  point  Hence,  interpretation  of  leave-k-out  diagnostics  for  coefficients 
is  not  so  clear  as  in  the  usual  regression  case.  On  the  other  hand,  diagnostics  for  the 
innovations  variance  in  Section  3  displayed  much  smaller,  and  often  negligible,  smearing 
effects.  In  this  section,  we  use  an  asymptotic  approximation  to  establish  an  analytical 
rationale  for  these  different  smearing  effects.  Although  the  approach  will  work  for  any 
ARIMA  model,  the  computations  are  quite  tedious  for  all  but  low  order  models  (where  they 
are  also  tedious).  Thus,  after  introducing  the  general  expressions  for  the  limiting  forms  of 
the  diagnostics  in  Sections  4.1  and  4.2,  we  concentrate  on  obtaining  explicit  calculations  for 
the  AR(1)  case  in  Sections  4.3  and  4.4. 


4.1  Expected  Asymptotic  Diagnostic  for  Coefficients 

In  order  to  understand  the  smearing  behavior  of  the  diagnostic  (3.2)  for  coefficients,  it  is 
helpful  to  use  an  asymptotic  representation  of  DC  (A  )  for  general  subset  deletions  A  For 

subsets  Akl  of  fixed  size  k  one  usually  has  a  -  ak  ,  =  0(  — ) ,  and  correspondingly  we  are 

*  fl 

interested  in  the  asymptotic  behavior  of  n  •  DC  (A ) .  Since  the  asymptotic  distribution  of 
this  quantity  is  quite  complicated,  we  work  with  the  expected  asymptotic  diagnostic  for  the 
coefficients 

EDC  (A)  =  E  [  lim  n  DC  (A  )  ]  4  ; 


=  £  [  lim  n2(d-aA  )' 1(a)  (a  -  a* )  ] 


Explicit  computations  of  (4.1)  are  based  on  one-step  approximations  to  aA  by  a. 
Denote  the  (efficient)  score  function  'En(a)  and  denote  its  derivative  x¥n  (a) : 

.  9  log  L  (a;  y„)  ;  32  log  £  (a:  y„ ) 

M'"(al  *  - 3^ -  T"<0)  “  3a3a- 

where  the  log-likelihood  L(a;  y„)  is  given  by  (2.8)  with  x„  replaced  by  y„  and  a2 
suppressed.  Let  log  (a;  y„)  be  the  log-likelihood  with  subset  A  removed,  and  let 
'F^/')(a)  and  'F^4)(a)  denote  the  corresponding  score  function  and  its  derivative.  Under 
suitable  regularity  conditions,  a  Taylor  series  expansion  of  about  a  yields 

0  =  *?>(£)  +  (aA  -a)'  ('¥?)(a)  +  op(n))  .  (4.2) 

One  difference  between  (4.2)  and  the  usual  log-likelihood  expansion  is  that  scaling  d^j-a 
by  n  (rather  than  VJT )  leads  to  a  non-degenerate  asymptotic  form. 

Since  a  a ,  -  —  '¥jf\ a)  1(a)  and  ¥„  ( a)  =  0 ,  we  may  rearrange  (4.2)  to 


obtain 


El  (A)  =  -n(aA  -a) 

=  (-/(dr1  +  *,(«)  ^(a) 

=  -/(dr1  (V?>(a)  -  Y„(d))  +  o,(l) 
=  -/( arl(Hf^ka)  -  ▼„(«))  +  o-(l) 


Combining  (4.3)  and  (3.5)  gives  the  asymptotic  form  of  n  DC  (A )  for  general  subset 


deletions  A 


nDC(A)  =  A(A,a)'l(a)~lA(A,a)  +  op(\) 


where 


A(  A ,  a)  =  TiA)(a)-'Pn(a)  . 


Hence,  the  expected  asymptotic  diagnostic  for  coefficients  is  given  by 


V  s 
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EDC(A)  =  E  [  lim  A( A  ,  a) /  ( a )“ 1  A(  A  ,  a)  ] 


Our  problem  is  now  reduced  to  computing  the  difference  A(  A  ,  a)  between  the  score  func¬ 
tion  with  and  without  subset  A  included,  and  evaluating  the  expectation  in  (4.6). 


4.2  Expected  Asymptotic  Diagnostic  for  Innovations  Variance 

In  the  same  spirit  as  in  (4.1),  we  shall  use  the  expected  asymptotic  diagnostic  for  the 


innovations  variance 


EDV  (A )  =  E  [  lim  n  DV(A)]. 


Denote  the  score  function  for  o2  and  its  derivative  by  XF„  (a  2 )  and  (a  2 ) ,  respectively, 
and  denote  these  functions  with  subset  A  removed  by  'F^i4)(a2)  and  vF^A)(o2) .  Then 

n(62-£2)  =  -±-4'<A)«J2)  '¥jlA\62)  +  op(  1)  (4.7) 

*  - 

’  'j-l 

where  — 'F^'4)(o2)  =  2o^  +  o_(l)  =  2ct4+o„(1).  From  this  and  the  definition 

n  r  r 

»  - 

(3.7),  applied  to  general  subset  deletions  A  ,  we  have 

•  ■> 

nDV(A)  =  -4r  «2(02-aA2)2 


=  2  6^  [4^>(a2)]2  +  o.(l) 


=  (2a4)(vF^)(a2)-'Fn(a2))2  +  o,(l)  . 
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Then  with 

A(A;o2)  =  ^>(o2)  -  ^(c2)  (4.8) 

we  have 

EDV (A )  =  2c4  E  (  lim  A2(A  ;  a2)) ,  (4.9) 

and  our  problem  is  reduced  to  computation  of  A(A  ;  c  ) ,  and  evaluating  the  expectation  in 
(4.9). 


4.3  AO  Models:  AR(1)  Case 
Expressions  for  EDCAO  (t) 

We  now  compute  EDC  (t)  =  EDC  (A)  |A  =  ,  and  EDV  (r )  =  EDV  (A)\A=t  for  the 
AR(1)  model  with  AO  type  outliers.  Let  xt  denote  an  outlier  free  Gaussian  process. 
Straightforward  algebra  (see  Appendix  B)  shows  that  for  the  outlier  free  process ,  the  differ¬ 
ence  in  the  score  functions  for  <t>  with  and  without  x,  is  given  by 

A(r;<|>)  =  -  ^  T - T  [  +x  t  (*,_i+xl  +  l) 

1+<T  c r4 

-  (*r-i  +  *,  +  i)2]  •  (4.10) 

(i+rr 

A  pleasant  feature  of  (4.10)  is  that  A(r  ;<{>),  and  hence  EDC(t),  depends  only  on 
x  t-itX  t  ,x  ,  +  More  generally,  for  AR(p)  models,  A(r,  a)  =  '¥%)  (a)-vF„  (a) , 

depends  only  on  xt  _p  ,  x(  _p  +  1 . x,  +p  .  Replacing  xr  by  y,  in  (4.10),  we  can  derive 

the  difference  in  the  score  functions  for  various  outlier  models. 

First  consider  the  case  where  y,  is  observed  with  a  single  AO  type  outlier  at  t0  ;  i.e  . 
behaves  according  to  (3.8)  with  z,  =  1  only  at  t  =  r0  : 


jjfiC 


-  V  -  * 


- ->  v--...  w  -  »■ » ■  ■-  y^y  «  ■ » 7-.»y  ■  w^rr*  yy 


26 


p,  '  *  'o 

*  =  {*,  +  ?  r  =  ,„  .  <4"> 

The  difference  in  the  score  function  A(r0;  6)  when  we  leave  y(Q  out,  for  example,  is  given 
by  (4.10)  with  xto  replaced  by  x,0  +  £.  Similarly,  for  leaving  yfQ+1  out  we  evaluate 
A(r0  +  l,<j>)  by  setting  r  =  r0  +  l  in  (4.10)  and  again  replacing  xf0  with  x,o  +  ^.  The 
expected  asymptotic  diagnostics  are  then  computed  by  substituting  the  appropriate  expres¬ 
sions  in  (4.6)  and  taking  expectations. 

We  begin  by  computing  EDC(t)  at  the  location  r  =  r  0  of  the  additive  outlier.  Using 
the  notation  A^^ (/;  4>)  in  place  of  A(r;< |>)  when  A( r ;  d>)  is  computed  under  an  AO 
model  with  an  outlier  of  size  £  at  time  tQ  ,  we  have 


o)  (‘O’  ^  ~  ~ 


(i+r) 


[  C)2  +  (X'+  O(xr0- 1  +-rr0+ 1)  (4. 12) 


4>- 

(l+<!>2)2 


(^o-l+^o+l)2l 


For  the  AR(1)  model,  7(<j>)  =  ( 1  -  <J>2  )  1 .  Substituting  this  and  (4.12)  into  (4.6),  along  with 
some  tedious  algebra  (see  Appendix  B),  yields 


(ro) 


fi 


'  4 

4>2  ( 1  —  <|>2 )  + 


1-<J>2)  + 


icl-A 

(  i+4>2)2 


(4.13) 


where  the  notation  EDC $?,0) (t )  parallels  that  for  A^,o)(r;  ()>). 

To  examine  the  effects  of  smearing,  we  now  compute  EDC*°l(3){t)  for  t  *  r0  .  It  is 
shown  in  Appendix  B  that,  as  one  might  expect,  EDC*°t^(t)  is  given  by  the  right-hand 
side  of  (4. 13)  with  £  =  0  for  t  *  tQ  -  l,r0,  to  +  1 »  which  is  the  expected  diagnostic  for  the 
noise  free  process.  Thus  it  suffices  to  compute  the  expected  diagnostics  for  t  -  t0  -  1  and 
r  =  t0  +  1 .  Furthermore,  by  inspection  of  (4.10),  it  is  evident  that  the  effect  of  an  outlier  at 
r0  is  symmetric:  EDC*Pt<j)  (r0-  1)  =  EDC*°,a)  (f0+  1)  .  Hence,  we  need  only  concern 


ourselves  with  EDC  ^.fo)  ( 1  o  +  1 ) »  computations  similar  to  those  above  yield 


EDC^{t^\)  = 


<•  ■" 

%2(l-02)  ( 

r  > 

A 

2 

1-8 

to 

/ - 

o 

a 

(1+02)4 

a 

L  J 

(1+02)3  , 

(4.14) 


2  ( 1  -  <t>2 ) 
(1+02)2 


Expressions  for  EDVa0  ( t ) 

To  obtain  EDVfff,^  (r)  under  the  AO  model  we  compute  the  difference  in  the  score 
functions  A(r;  a2)  for  a2  given  by  (4.8)  with  A  =t  and  apply  (4.9).  Using  the  same 
notation  as  in  (4.12)  we  have  for  t  =  r0  (see  Appendix  C) 

-  ^r  +  ^7  [-<1+«,2H*r.  +  ;)2  (4.15) 


+  2*(x/#  +  C)(x,-_i+x,0  +  i) 


- ^-T  (*r0-l+*r0  +  l)2 


1+0“ 


and  (again  with  tedious  algebra) 

<'o>  - 


*  « 

4  (1+02)2  + 

r  * 

1 

a 

to  to 

2 

a 

8(1  +02)  +  1  . 


(4.16) 


As  with  £DC(£?,o)(r) ,  for  r*r0-l,  r0>  ro+l>  EDVAjPt^  (t)  is  simply  the 
right-hand  side  of  (4.16)  with  £  =  0.  Also,  the  same  symmetry  relations  hold  for 
EDVyP(o)(t),  so  that  EDVAgto)  (r0  -  1)  =  £DV^?f#)  (r0  +  1),  where  straightforward 


computations  give 


Comparisons  of  EDCA0  (r)  and  EDVA0  (r) 

Although  the  dominant  terms  in  (4.13),  (4.14),  (4.16),  and  (4.17)  are  proportional 

,  the  coefficients  for  this  term  are  uniformly  smaller  for  (4.17)  than  for  (4.14)  relative 

to  (4.16)  and  (4.13).  However,  the  difference  between  the  behavior  of  EDCAO  (r)  and 
EDVa0  (r)  is  best  seen  graphically. 

Figure  4a  plots  £DC£?4;fa)(r)/ 100  curve  (i.e.,  the  expected  asymptotic  diagnostic 
assuming  an  outlier  of  size  4  at  time  r0)  for  t  =  r0-3,  r0-2,  •••,  r0  +  3  with 
0  =  .3,  .6,  .9.  Figure  4b  gives  the  corresponding  plot  for  EDVA+4;tg)(t )/100.  The  scal¬ 
ing  factor  of  1/100  approximates  the  expected  value  of  the  diagnostics  for  a  sample  size  of 
100.  The  asymptotic  approximations  verify  what  was  observed  in  Example  1  for  AO 
models:  the  smearing  is  worse  for  DC  ,  and  DV  tends  to  be  more  sensitive. 

Due  to  sampling  fluctuation,  the  patterns  of  diagnostics  observed  in  Example  1  differ 
from  the  expected  diagnostics  in  two  regards:  the  magnitude  of  DC  and  DV  is  larger  than 
EDC  and  EDV  ,  and  the  pattern  over  time  for  DC  is  not  the  same  in  that  the  largest  diag¬ 
nostic  is  for  the  time  point  after  the  outlier  ( t  =  28  ). 

In  Figure  5a,  we  compare  the  amount  of  smearing  graphically  for  DC  and  DV  as  a 

EDCtf. i;,o)  ('o-  1)  EDV$.,#  +  1)  (r0  -  1) 

function  of  0 .  The  ratios  - — -  (solid  line)  and  - jp - 

EDCA°.lo)(t0)  EDVA°4.tg)  (r0) 

(dashed  line)  represent  the  proportional  amount  of  smearing  for  an  outlier  of  size  4  at  1 0 

These  ratios  are  always  less  than  unity.  However,  the  expected  asymptotic  smearing  for  DV 

is  small  in  absolute  terms  for  all  0,  and  also  substantially  smaller  than  that  of  DC  for  all 

but  quite  large  values  of  0  .  The  smearing  for  DC  is  greater  than  .5  for  a  large  range  o'  c 


values,  and  this  suggests  that  in  such  situations,  smeanng  may  lead  to  some  confusion  when 
examining  DC  . 


The  potential  for  confusion  in  fact  becomes  unquestionably  serious  in  situations  where 
there  is  more  than  one  outlier  present.  We  demonstrate  this  in  the  very  simplest  context. 
Suppose  yt  is  observed  with  two  isolated  AO  type  outliers  of  size  C,  at  times  r  0  —  1  and 
r0  +  l.  Note  that  for  r  *  tQ  ,  the  expected  asymptotic  diagnostics  EDC(°to_lxla  +  l)  (t) 
and  EDVfglQ_l  (o  +  l)  (r)  are  given  by  (4.13),  (4.14),  (4.16),  and  (4.17)  above  (since  only 
yt-\>  yt  ,  ami  yf  +  i  enter  in  (4.10)  and  (4.15)).  Also,  it  is  easy  to  show  that  as  a  conse¬ 
quence  of  the  symmetry  in  the  expected  asymptotic  diagnostics  for  AO  models, 
KCg..,,, =  £DC?£„-1)<‘0>  md  HJVjg, (t0) 

=  ££V'$£f#_1)  (fo)-  That  is>  die  smearing  effect  of  an  AO  type  outlier  is  additive: 
outliers  of  size  £  at  to-1  311(1  ro  +  1  are  equivalent  to  an  outlier  of  size  2£  at  r0-l. 


To  see  just  how  serious  smearing  can  be  in  this  situation,  consider  the  case  where 
there  are  outliers  of  size +4  at  r0  - 1  and  r0  +  l.  Figure  5b  exhibits 


EDC^j.  ,a_ ! ,fo4.1)(r0  ) 
EDC (+4; t0_  i,/0  + 1 )  ( ro  ~  1 ) 


+  f0  -  1,  f0  +  1)  ( f  o  ) 

£DV('f4.to_1>f#  +  1)(r0  - 1 ) 


(solid  line)  and 


(dashed  line)  as  a  function  of  <)>.  The  expected  asymptotic  value  of  DC(r0  )  with  outliers 
at  to-1  3,1(1  ro  +  1  is  larger  than  the  expected  asymptotic  diagnostic  at  either  outlier  posi¬ 
tion  for  all  <}> ,  and  has  a  maximum  value  almost  six  times  larger!  Thus,  DC  will  be  totally 
ineffective  in  revealing  such  a  configuration  of  outliers.  By  contrast,  the  ratio  for  EDY 
stays  below  one  for  all  0 ,  and  is  substantially  smaller  than  one  except  for  values  of  !  0  near 
one.  One  therefore  expects  DV  to  be  far  superior  to  DC  in  revealing  such  outlier 
configurations. 


4.4  10  Models:  AR(1)  Case 


The  analysis  of  smearing  for  10  models  parallels  that  for  AO  models.  However,  since 
the  outlier  occurs  in  the  innovations  of  the  process,  the  difference  in  the  score  functions  for 
<J>  L  not  symmetric,  as  was  the  case  for  AO  models.  Suppose  y,  is  observed  with  an  IO 
type  outlier  of  magnitude  £  at  r0  ,  i.e,  e,  is  given  by  (3.9)  with  z,  =  0  except  at  r  =  1 0 
where  z,  =  1 .  If  x  ,  represents  the  series  without  the  innovations  outlier,  then  it  is  easy  to 
check  that 


yt 


X,  ,  t  <  t0 

xt  .  tz  t0 


(4.18) 


Details  concerning  the  calculation  of  the  expressions  to  follow  are  provided  in  Appendices  B 
and  C. 


Expression  for  EDC  ,0  (r ) 

From  (4.10)  we  get  the  difference  in  the  score  functions  for  9  with  and  without  y,o : 

<>;♦)  -  (419> 

l  —  9  o 

+  (*r0+0(*f0-l  +  (Jcr0+l  +  <O 


(l+4>2)2 


<t>0)2] 


The  notation  »„>(*;  $  )  is  used  to  indicate  that  we  have  a  single  IO  type  outlier  of  size 
C,  at  time  r0.  It  is  now  straightforward  to  show  that 


(4.20) 


EDC&">)  ('o) 


(1  +  02)4 


/•  « 

1 

2 

2-4>2 

1  + 

o 

o+y2)3  JJ 

2  ( 1  -  <t)2  ) 

( l+4>2  )2 


Although  the  dominant  term  is  still  proportional  to 


it  is  of  order 


o  (<t>5) 


as 


|  <p  |  — >  0 . 

For  r<r0-l,  EDC[°ilo)  ( t )  is  given  by  (4.20)  with  ^aO.  EDC[^.t^  (r0  -  1 )  is 
identical  to  the  counterpart  (4.14)  for  AO  models  (recall  that  (4.14)  is  also  the  value  for 
r  =  r0  -  1 ).  For  r  =  r0  +i  ,  i  =  1,2,  •  •  •  , 


EDC{°.,.)  Oo  +  O  = 


i. 

a 


^-^(l -0)2)3  2  (  1  -  (j)2 ) 

(1+<I>2)2  (l+<t>2)2  ■ 


(4.21) 


Note  that  the  dominant  term  in  (4.21)  is  proportional  to  (  -^-  )  rather  than  (  )  . 


o'  a 

Hence,  the  effect  of  innovations  outliers  on  EDC  for  r  >r0  is  “smaller”  than  that  at 
t  =  1 0  - 1  and  r  =  r  0  and  furthermore  the  effect  dies  out  exponentially  fast  in  t. 


Expressions  for  EDVto  (f) 

Substituting  yt  in  (4.18)  for  x,  in  (4.15)  gives  the  difference  in  the  score  functions  for 


V.V  Afcifcy.'''.' A-'- 

•  -  »  -  ■  -  *  ji  ’  -  *  «  "  -  V  V 


/  v  v.v.V. v. 


/.’a 


VS/o)('o;a2)  =  +  -fr[-(l+02)(x/o  +  C)2 


2az  2a ‘ 


+  2 4>  ( x,  ■ +  C )  ( x,  _ !  +  ( x,  + 1  +  <>i ; ) ) 


TK-i  +  (^o  +  i  +  <K))2]  (4-22) 


Use  of  (4.9)  yields 


r  r  ^ 

EDV{?.ta)  (r0)  =  ±  ±  - 1—  +  1 

^  a  2(1  +4>2)2  a  (l+<|>2) 


(4.23) 


For  r*r0-l  or  r0 ,  EDV^.to^(t)  is  given  by  (4.23)  with  ^i0,  and 
EDV  f£.(g)  ( t0  - 1 )  is  given  by  (4.17),  the  corresponding  expected  diagnostic  under  the  AO 
model.  Unlike  the  AO  case,  EDV (£/a)  ( r0  + 1 )  does  not  depend  on  £ . 

Comparisons  of  EDCl°  (r)  and  EDV10  (r) 

With  IO  models,  the  behavior  of  the  smearing  effects  of  an  outlier  at  r0  differ  even 
more  dramatically  for  EDC  and  EDV  .  For  EDC  ,  the  effect  of  smearing  is  not  restricted 
to  points  immediately  adjacent  to  the  time  of  occurrence  t0  of  an  outlier.  Specifically,  an 
outlier  affects  EDC^,^  (t )  for  all  r£r0-l  (i.e.,  leaving  out  the  previous  point  or  any 
future  point).  By  way  of  contrast,  the  effects  of  an  outlier  at  r0  are  seen  only  at  r0  - 1  and 
r0  for  EDV  ! 

Figures  4c  and  4d  display  ££>C(+4;,0)  (r  )/100  and  EDV®4.to)  (r  )/100  for 
r=r0-3,  r0-2,  •  *  * ,  r0  +  3  with  d>  =  .  3 ,. 6 ,  and  . 9 .  The  severe  smearing  of  DC  at 
r  =  r 0  —  1  is  reflected  in  Figure  4c,  where  £DC(+4fo)(ro_  1)  dominates 
EDC(+4.t)  (r0 ).  This  is  also  demonstrated  by  Figure  5c,  which  shows 


£DC(+4;f,)(r0  - 1 ) 


(solid  line)  and 


EDV{°4;to)(t  o-l) 


EDC{°,.,t)U0)  EDV'°a.u)(,0) 


(dashed  line).  The  ratio 


■r  s " 


ty  /y 
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for  EDC10  is  greater  than  one  for  most  values  of  <?>,  while  the  ratio  for  EDV10  stays  well 
below  unity,  except  for  |  $  |  close  to  1 . 

These  results  extend  to  AR(p)  models:  dominant  values  of  EDC10  can  occur  at  the  p 
consecutive  times  preceeding  an  isolated  outlier.  The  use  of  EDV (r)  is  obviously  preferred 
for  IO  (as  well  as  AO)  situations. 


V 

I 


tf 


c)  EDC:  IO  type  outlier  of 
size  +4  at  time  t 


d)  EDV:  10  type  outlier  of 
size  +4  at  time  t 


Expected  Asymptotic  Diagnostic  Curves 


Figure  4 


*  %  %  w.  -  v  V  A  •;  >/ •;  V  V  ’*  ■  • 


a)  Proportion  of  Smearing  due  to  an  Isolated  Additive  Outlier 


phi 


b)  Proportion  of  Smearing  due  to  Two  Adjacent  Isolated  Additive  Outliers 


c)  Proportion  of  Smearing  due  to  an  Isolated  Innovations  Outliers 
Expected  Asytmptotic  Diagnostics:  Smearing  Plot 
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5.  Diagnosing  IO  Versus  AO 

Notice  that  Figures  lc  and  2c  appear  to  be  very  much  alike,  in  spite  of  the  fact  that  the 
outlier  at  r  =  28  is  an  AO  in  Figure  lb  and  an  IO  in  Figure  2b.  DV  identifies  the  outlier  in 
both  cases,  but  provides  no  additional  information  as  to  the  outlier  type.  On  the  other  hand, 
the  distinctively  different  behavior  of  DC  for  the  two  cases  allows  one  to  use  the  auxiliary 
information  contained  in  DC  to  help  decide  whether  an  outlier  is  IO  or  AO:  If  DC  is 
smeared  to  both  sides  of  the  time  of  occurrence  of  the  outlier  as  identified  by  DV ,  then  the 
outlier  is  probably  AO.  If  DC  is  large  prior  to  the  outlier  time  identified  by  DV ,  but  small 
at  the  outlier  time,  then  the  outlier  is  probably  IO. 

A  more  formal  way  of  determining  whether  an  outlier  identified  by  DV  is  IO  or  AO  is 
to  use  a  robustified  version  of  Fox’s  test  (1972),  as  described  in  Martin  and  Zeh  (1977),  See 
also  the  non-robust  use  of  Fox  type  tests  in  an  outlier  identification  and  model  fitting  scheme 
proposed  in  Hillmer,  Bell  and  Tiao  (1983). 

A  less  formal  way  of  determining  whether  an  outlier  is  IO  or  AO  is  to  examine  a  lag- 1 
scatter  plot  of  the  residuals.  As  was  pointed  out  by  Martin  and  Zeh  (1977),  IO’s  tend  to  fall 
near  the  abscissa  and  ordinate  of  such  a  plot,  whereas  AO’s  tend  to  appear  away  from  the 
abscissa  and  ordinate,  assuming  that  robust  parameter  estimates  have  been  used  to  form  the 
residuals.  In  the  present  context,  we  recommend  using  the  parameter  estimates  obtained 
with  the  outliers  identified  by  DV  deleted. 

Figures  6a  and  6b  display  the  lag-1  residuals  plot  for  the  data  of  Figures  la  and  2a 
respectively.  The  circled  points  are  (e21,e2%)  and  (e2g,  e 29),  where  e,  is  the  one-step 
ahead  prediction  residual  for  time  t  .  As  expected,  the  (e2g,e29)  point  for  AO  falls 
further  from  the  abscissa  than  in  the  IO  case.  In  particular,  for  the  IO  case,  the  ordinate 
value  is  well  within  the  bulk  of  the  data,  while  in  the  AO  case,  it  is  near  the  extreme  lower 
range  of  the  data  (it  is  the  third  smallest  value). 


37 


Combining  the  DC  diagnostics  with  the  lag-1  scatterplots,  we  obtain  a  convincing 
graphical  display  identifying  the  type  of  an  isolated  outlier.  However,  when  a  patch  of 
outliers  is  present,  these  techniques  do  not  directly  apply,  and  determination  of  the  outlier 
type  is  a  more  subtle  problem. 
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6.  Overall  Strategy 

In  this  section  we  present  an  overall  strategy  for  AREMA  model  fitting  using  leave-k- 
out  diagnostics.  In  Section  6.1,  to  handle  problems  caused  by  "masking  ",  we  imbed  the 
approach  of  Section  3  for  determining  the  length  of  a  patch  of  outliers  in  an  iterative  deletion 
procedure.  We  discuss  more  flexible  subset  deletion  techniques  in  Section  6.2  to  handle 
cases  where  the  iterative  deletion  procedure  fails.  Finally,  an  overall  strategy  for  model 
identification  and  fitting  is  given  in  Section  6.3. 

6.1  An  Iterative  Deletion  Strategy 

The  masking  of  influential  points  (e.g.,  outliers)  by  other  influential  points  is  a  problem 
encountered  in  all  types  of  diagnostics.  As  we  have  already  seen,  masking  caused  by  a  sin¬ 
gle  patch  of  outliers  can  be  handled  adequately  by  leave-k-out  diagnostics.  However,  some¬ 
times  the  presence  of  a  gross  outlier  will  have  sufficient  influence  so  that  deletion  of  aberrant 
values  elsewhere  in  the  series  has  little  effect  on  the  estimate.  More  subtle  types  of  masking 
occur  when  moderate  outliers  occur  in  close  proximity  to  one  another.  These  types  of  mask¬ 
ing  can  often  be  effectively  uncovered  by  an  iterative  deletion  process  which  consists  of 
removing  suspected  outliers  from  the  data,  and  recomputing  the  diagnostics. 

To  deal  with  problems  caused  by  masking,  we  build  upon  the  initial  patch  length  deter¬ 
mination  strategy  of  Section  3  as  follows: 

Step  1 

Run  leave-k-out  diagnostics  on  the  data,  for  k-  1,2,...,  until  either:  (a)  the 
length  of  the  most  influential  (significant)  patch  is  determined  using  the  guidelines 
of  Section  3,  or  (b)  k  =  K ^ ,  where  K ^  is  determined  by  the  user.  In  principle. 
K ma*  is  the  length  of  the  longest  patch  of  outliers  thought  to  be  present  in  the 
data.  However,  computational  costs  require  that  K  miX  is  reasonably  small  (see 
run  time  results  in  Section  8.2;  for  “short”  time  senes,  i.e.,  n  <  250,  se"  ■ 
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^m*x=5  will  often  reveal  most  if  not  all  problems  with  the  data).  Case  b  can 
result  from  two  possibilities:  either  no  influential  observations  were  detected,  or 
the  length  of  an  influential  patch  is  ill-determined  (according  to  the  guidelines  of 
Section  3).  In  the  latter  case,  which  may  be  due  to  the  presence  of  a  patch  of 
length  greater  than  K we  determine  the  “most  influential’’  patch  as  that 
corresponding  to  the  most  significant  diagnostic. 

Step  2 

If  no  influential  points  are  found,  then  conclude  the  analysis.  If  influential  points 
are  found,  then  delete  the  most  influential  points  as  identified  in  step  1 ,  and  go 
back  to  step  1 .  The  new  leave-k-out  coefficients  should  be  scaled  according  to  the 
MLE  computed  with  the  outliers  removed,  so  as  to  gauge  additional  influence  of 
the  remaining  points ! 

The  next  two  artificial  examples  illustrate  the  efficacy  of  the  iterative  deletion  pro¬ 
cedure  in  handling  problems  caused  by  masking. 

Example  33  (continued): 

With  the  above  modified  guidelines,  we  would  identify  y  l5  as  an  outlier  after  running 
leave- 1 -out  and  leave-2-out  diagnostics  (see  Figures  3b  and  3c).  Performing  iterative  dele¬ 
tion,  we  “remove’’  y  15  from  the  data  (i.e.,  treat  y  l5  as  missing)  and  recompute  leave-k-out 
diagnostics  for  k  =  1,  2,  3 .  These  are  displayed  for  DV  in  Figures  7a-7c,  and  give  convinc¬ 
ing  evidence  of  a  further  patch  of  3  outliers  centered  at  t  -  61 .  Note  that  the  pattern  of  diag¬ 
nostics  after  iterative  deletion  is  nearly  identical  (except  for  values  associated  with  y  l5 )  to 
the  original  set  of  diagnostics  (cf.  Figures  3b-3d).  However,  the  magnitude  of  the  diagnos¬ 
tics  is  much  larger  after  iterative  deletion!  This  is  quite  typical:  a  non-adjacent  outlier! si 
will  mask  other  outliers  by  decreasing  the  magnitude  of  the  diagnostics,  but  not  altering  the 


pattern. 


Example  6.1 :  Simulated  MA(  1 ),  9  =  -  .5,  AO  Model  with  1  Patch  and  1  Adjacent  Isolated 
Outlier 

The  data  is  the  same  as  that  used  in  Example  3.3,  except  that  the  isolated  outlier  is 
moved  from  point  15  to  point  58,  adjacent  to  the  patch  of  outliers  at  points  60-62.  The  data 
is  plotted  in  Figure  8a.  Leave- l-out,  leave-2-out,  leave-3-out  and  leave-4-out  diagnostics  for 
DV  are  given  in  Figures  8b-8e. 

The  masking  is  much  more  severe  than  in  Example  3.3:  the  isolated  outlier  is  now 
completely  masked  for  the  leave- l-out  case  (cf.  Example  3.3),  though  the  patch  still  shows 
up  prominently  in  the  leave-3-out  diagnostics.  However,  leave-4-out  diagnostics  are  only 
slightly  more  significant  than  leave-3-out,  and  the  pattern  of  smearing  is  consistent  with  a 
patch  of  3  outliers.  So  following  our  strategy,  we  delete  points  60-62  and  recompute  the 
diagnostics.  The  isolated  outlier  is  now  easily  identified  by  the  recomputed  leave- l-out  diag¬ 
nostics  of  Figure  8f:  removal  of  the  patch  eliminates  the  masking  problem. 

6.2  Local  Stuctural  Changes  and  Flexible  Subset  Deletion  Techniques 

Until  now,  we  have  concentrated  on  influential  points  in  the  form  of  outliers.  However, 
influential  points  may  also  be  due  to  other  types  of  disturbances,  such  as  level  shifts  or  vari¬ 
ance  changes.  The  iterative  deletion  procedure  of  Section  6.1  is  often  effective  for  uncover¬ 
ing  these  types  of  problems.  However,  the  procedure  will  sometimes  fail  in  the  presence  of 
an  influential  patch  longer  than  K  miU;  an  example  of  this  is  provided  in  Section  7.  Masking 
may  prevent  a  long  patch,  or  any  points  in  the  patch,  from  being  detected.  Even  when  the 
patch  is  detected,  if  the  disturbance  spans  a  time  period  considerably  greater  than  K  max .  the 
iterative  deletion  may  require  an  intolerable  number  of  iterations.  To  handle  these  failures, 
we  adopt  a  data  oriented  "free  and  easy”  approach  to  flexible  subset  deletion. 

We  consider  deviating  from  the  iterative  deletion  procedure  and  using  the  fiev.h'-.* 
approach  primarily  in  two  kinds  of  situations.  First,  when  examining  the  data  and  • 
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residuals,  the  analyst  may  suspect  a  structural  change  in  the  data.  Second,  the  leave-k-out 
diagnostics  may  indicate  at  a  local  disturbance  of  duration  greater  than  K max,  (e.g.,  as  when 
the  patch  length  is  ill-determined;  see  step  1  of  Section  6.1).  In  either  case,  flexible  subset 
deletion  techniques  can  help  identify  the  structure  more  precisely. 

An  attractive  way  of  carrying  out  flexible  subset  deletions  is  through  the  use  of  interac¬ 
tive  graphics  on  a  computer  workstation.  Candidate  subsets  are  identified  on  computer 
graphics  plots  of  the  data  and/or  the  residuals,  and  DV  is  computed  for  such  subsets.  For 
example,  if  the  analyst  believes  a  local  level  shift  is  present  somewhere  between  the  times  r0 
and  t  j,  then  he/she  would  compute  DV  for  a  judicious  selection  of  patches  between  r0  and  t  j 
in  order  to  clarify  the  jump  points.  This  procedure  may  easily  be  carried  out  with  the  aid  of 
a  mouse  and  appropriate  software  (see  Section  8.2). 

A  non-interactive  and  computationally  expensive  approach  is  to  run  leave-k-out  diag¬ 
nostics  on  the  data  for  selected  values  of  k  between  K ^  and  n/2.  A  “top  down’’  approach 
for  selecting  k  is  to  use  k  -  ( n  12  ] ,  [  n  /4 ] ,  .  .  . ,  [nl2r  )  where  r  is  the  largest  integer  such 
that  n/2r  >Ktaix  ■  An  alternative  “bottom  up”  approach  is  to  choose  k  =  2s  ,  2J+1 ,  2' 

where  s  is  the  smallest  integer  such  that  2,£/fnux  and  t  is  the  largest  integer  such  that 
2‘  <  n /2.  From  these  diagnostics,  the  disturbances  can  often  be  clarified. 

Another  application  of  the  “bottom  up”  or  “top  down”  diagnostics  is  as  a  final  check 
on  the  model;  see  Section  6.3. 

6.3  Model  Identification  and  Overall  Strategy 

The  foregoing  analysis  presumed  that  the  degree  of  differencing  and  the  order  for  the 
model  was  known.  In  practice,  this  is  rarely  the  case,  and  the  model  must  be  determined  b> 
some  criteria  such  as  the  Box  Jenkins  identification  procedure.  However,  outliers  may  cuu>e 
improper  model  specification.  To  handle  order  selection  in  the  presence  of  outliers  j:m 
structural  changes,  we  embed  the  iterative  deletion  strategy  in  an  iterative  procedure  s :  ~ 


to  that  used  by  Tsay  (1986). 


Overall  Strategy 

Step  0:  Tentative  Identification 

Using  the  Box  Jenkins  methodology,  determine  a  tentative  model.  This  involves  speci¬ 
fying  the  degree  of  differencing  and  selecting  the  order  of  the  ordinary  and  seasonal 
ARMA  components. 

Step  I:  Iterative  Deletion 

Perform  leave-k-out  diagnostics  using  the  iterative  deletion  strategy  of  Section  6. 1 
until:  no  additional  influential  points  are  detected,  or  a  model  change  is  suspected. 
Recall  that  the  second  situation  may  be  triggered  in  two  ways,  as  described  in  Sec¬ 
tion  6.2.  In  the  first  case,  go  to  step  II,  while  in  the  second  case,  go  to  step  IV. 

Step  II:  Confirming  Model  Order 

With  the  observations  identified  as  influential  removed,  i.e.,  treated  as  missing  data, 
determine  the  order  of  the  model  once  again.  If  the  same  model  is  selected,  then  go  to 
step  HI.  Otherwise,  remove  the  influential  observations  and  go  back  to  step  I. 

Step  HI:  Final  Check 

To  ensure  a  longer  patch  is  not  missed,  perform  the  “bottom  up”  or  “top  down” 
approach  as  described  in  Section  6.2.  If  nothing  is  revealed,  then  conclude  the  analysis 
in  the  usual  way.  On  the  other  hand,  if  a  structural  change  is  detected,  then  go  to 
step  IV. 

Step  IV:  Handling  Structural  Changes 

Split  the  data  according  to  the  conjectured  model  change  point(s).  Analyze  separately 
those  parts  which  are  sufficiently  long.  That  is,  go  to  step  0,  treating  each  sufficiently 
long  section  of  data  as  different  series,  and  ignoring  any  segments  which  are  longer 
than  K  max,  but  not  long  enough  to  warrant  model  fitting. 


Analysis  of  Residuals  and  Influential  Points 

In  the  presence  of  outliers  and  structural  changes,  the  usual  prediction  residuals  are 
often  misleading  for  identifying  the  influential  observations.  Instead,  we  recommend  exa¬ 
mining  the  residuals  based  on  the  predictions  formed  when  the  observations  identified  as 
influential  are  treated  as  missing.  Since  the  predictions  are  not  distorted  by  influential  obser¬ 
vations,  this  procedure  reveals  outliers  and  structural  changes  more  clearly. 

After  selecting  the  “final”  model,  a  careful  analysis  of  the  influential  data  points 
should  be  carried  out.  Of  particular  interest  is  the  determination  of  any  physical  causes  or 
events  related  to  such  points.  Also,  one  may  be  able  to  categorize  influential  points  as  iso¬ 
lated  or  patches  of  outliers,  or  perhaps  associate  them  with  a  level  shift  or  variance  change. 
Points  diagnosed  as  an  outliers  can  be  further  classified  by  type  (AO  versus  IO)  using  the 
techniques  described  in  Section  5. 

Use  of  Intervention  Analysis 

A  variety  of  structural  changes,  such  as  outlier  patches,  level  shifts,  and  even  variance 
shifts,  which  may  be  detected  by  the  leave-k-out  strategy,  can  be  handled  by  intervention 
analysis,  as  in  Box  and  Tiao  (1975).  The  prediction  residuals  for  local  structural  changes 
provide  information  which  may  suggest  a  small  palette  of  intervention  "shapes".  We  note 
that  the  diagnostics  may  suggest  intervention  analysis  which  might  be  otherwise  overlooked 
because  the  investigator  was  unaware  of  any  particular  "cause”  (e.g.,  policy  change). 


7.  Applications  to  Real  Data 


In  this  section,  we  analyze  two  economic  time  senes  using  the  strategy  articulated  ;r. 
Section  6.  The  first  series  is  relatively  well  behaved,  except  for  several  patches  of  outliers 
The  second  is  more  difficult  to  model,  since  it  contains  several  local  nonstationanues  and 
disturbances,  including  level  shifts  and  a  variance  change.  For  brevity,  we  omit  the  details  of 
model  selection  in  the  examples  to  follow,  and  concentrate  instead  on  the  diagnostics. 


Example  7.1 :  Exports  to  Larin  American  Republics:  1966-1983 

In  this  example,  we  study  monthly  unadjusted  data  on  exports  from  the  United  States  to 
Latin  American  Republics.  This  series  was  examined  by  Burman  (1985),  who  focused  on 
outliers  and  forecasting  in  U.S.  Census  Bureau  data.  A  plot  of  the  logarithm  of  the  data  is 
given  in  Figure  9a;  the  circled  values  represent  points  eventually  deleted  from  the  series. 
Following  Burman,  we  fit  a.  IMA  (0, 1,2)  to  this  series,  and  the  residuals  from  the  MLE  fit 
are  plotted  in  Figure  9b. 

Leave-k-out  diagnostics  for  k  =  1,2,3  are  displayed  in  Figures  9d-9f  for  DV . 
Again,  for  clarity,  plots  for  DC  are  omitted  in  this  example.  Leaving  out  longer  patches 
reveals  nothing  new,  since  the  series  is  dominated  by  the  outliers  at  1/69  and  2/69  (i.e.,  Jan. 
1969  and  Feb.  1969).  The  effect  on  the  innovations  variance  of  leaving  these  two  points  out 
is  dramatic  ( p  -  value  <  .0001):  see  Figure  9c.  It  is  unlikely  that  a  broader  patch  of  outliers  is 
present  in  this  time  period,  since  leave-3-out  diagnostics  yield  no  increased  significance,  and 
the  smearing  is  consistent  with  a  patch  of  two  outliers.  Note  that  the  plots  also  hint  at 
outliers  in  the  last  quarter  of  1971.  In  fact,  DV ( •  ,2 )  is  clearly  significant  for  other  points 
(e.g.,  10/71  )  and  is  "masked"  since  the  scale  of  the  diagnostic  at  1/69  and  2/69  is  so  large 

Following  the  strategy  of  Section  6,  we  remove  the  points  at  1/69  and  2/69  ar.d 
recorroute  the  diagnostics  for  k  =  1,2,  3,4.  The  results  of  the  first  round  of  iterative  dele 
tion  are  displayed  in  Figures  9g— 9j.  Using  the  guidelines  of  Section  3.  DV  identities  •re 


patch  9/71,  10/71,  and  11/71  as  outliers,  with  a  p-value  <.01  .  Evidence  for  including 
9/71  as  part  of  the  patch  is  weaker  than  for  10/71  and  11/71  :  the  increase  of  DV  ( 3  ,t  )  over 
DV(  2  ,r  )  for  r  =  10/71  is  relatively  small.  However,  the  pattern  of  smearing  in  Figures  9h-9j 
is  more  consistent  with  a  patch  of  three  outliers  than  with  a  patch  of  two.  Hence,  these 
points  are  removed,  and  the  diagnostics  are  recomputed. 

Leave-k-out  diagnostics  for  k  =  1,2, 3,4  for  the  second  round  of  iterative  deletion  are 
displayed  in  Figures  9k-9n.  These  plots  are  noisier,  but  an  influential  patch  at  12/76,  1/77  , 
and  2/77  is  clearly  identified.  The  values  for  DV  warrant  deletion  of  these  points  (p -value 
<  .30),  though  the  “significance”  is  much  smaller  than  in  previous  rounds. 

A  third  round  of  iterative  deletion  was  performed.  The  resulting  leave-k-out  diagnos¬ 
tics  for  k  -  1,2, 3  are  displayed  in  Figures  9o~9q.  Again,  following  the  guidelines  of  Sec¬ 
tion  3,  a  patch  of  outliers  at  1/78-2/78  is  identified,  though  just  barely  (p-value  =  .45).  Note 
that  leave- 1 -out  diagnostics  do  not  pick  up  the  patch:  we  need  leave-2-out  to  identify  these 
points  as  influential. 

Other  potential  outliers  are  indicated  by  Figure  9o  (10/68  and  10/70),  but  are  associated 
with  fairly  high  p -values.  These  points  correspond  to  moderately  large  residuals  (see 
Figures  9b  and  9c),  but  evidently  do  not  significantly  influence  the  estimates  of  the  parame¬ 
ters.  Running  another  round  of  iterative  deletion  (not  shown)  yields  little  change  in  the 
significance  of  these  points  or  any  others. 

In  the  final  analysis,  four  groups  of  outliers  were  identified  and  removed  using  three 
rounds  of  iterative  deletion.  The  points  which  were  deleted  at  each  stage,  and  the 
corresponding  iVfl.E’s,  are  given  in  Table  1.  Removal  of  the  influential  points  results  in  a 
drop  in  the  estimated  innovations  variance  by  an  impressive  factor  of  two.  The  first  tw.; 
groups  of  outliers  (1/69-2/69  and  9/71-11/71 )  correspond  to  dock  strikes  and  forestall. re 
yielding  large  negative  and  positive  outliers  respectively.  The  other  groups  (12/76-2  ”  m 
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1/78-2/78)  have  no  known  cause,  and  exert  considerably  less  influence  on  the  estimated 
parameters.  Burman  (1985)  identified  the  first  two  groups  as  outliers,  along  with  10/68  and 
10/70,  using  the  model  based  methodology  of  Hillmer,  Bell,  and  Tiao  (1983).  Note  that  the 
latter  two  points  show  up  in  the  leave-k-out  diagnostics  (see  Figure  9o),  but  not  so  prom¬ 
inently  as  the  other  patches  at  12/76-2/77  and  1/78-2/78,  which  were  not  identified  by  Bur- 
man.  Once  again,  we  see  the  importance  of  searching  for  influential  patches  as  well  as  iso¬ 
lated  outliers. 


Table  1: 

Parameters  Fit  to  Export  Data 

Iteration 

Step 

6i 

^2 

a2 

Points 

Deleted 

0 

.367 

.160 

.0114 

— 

1 

.460 

.003 

.0083 

1/69,  2/69 

2 

.448 

-.041 

.0066 

9/71,  10/71,  11/71 

3 

.431 

-.058 

.0060 

12/76,  1/77,  2/77 

4 

.43 

-.08 

.0056 

1/78,  2/78 

The  residuals  based  on  one-step  predictions  computed  from  the  data  with  the  outliers 
removed  (i.e.,  treated  as  missing  data)  are  given  in  Figure  9c.  To  obtain  the  predicted  values, 
the  MLE  estimated  with  the  outliers  removed  was  used.  The  general  pattern  is  similar  to  the 
original  set  of  residuals  (see  Figure  9b),  but  with  an  important  difference:  the  large  residuals 
in  Figure  9c  correspond  to  the  points  identified  as  outliers  in  the  above  analysis 
Specifically,  that  last  outlier  in  each  patch,  masked  m  Figure  9b,  shows  up  prominently  in  the 
residual  plot  of  Figure  9c.  Correspondingly,  the  residuals  following  the  patch  of  outliers, 
which  are  large  in  Figure  9b,  reveal  nothing  unusual  in  Figure  9c.  Thus,  the  plot  of  residuals 
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a)  Plot  of  Data 
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b)  Residuals  from  (0,1 ,2)  Fit 
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c)  Residuals  from  (0,1,2)  Fit:  Outliers  Removed 
Example  7 . 1:  Log  of  Exports  to  Latin  America 

Figure  9 
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g)  DV:  Leave-1 -Out  after  One  Round  of  Iterative  Deletion 


1968  1967  1968  1969  1970  1971  1972  1973  1974  1975  1976  1977  1978  1979  1980  1981  1982  1983  1984 


h)  DV:  Leave-2-Out  after  One  Round  of  Iterative  Deletion 
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i)  DV:  Leave-3-Out  after  One  Round  of  Iterative  Deletion 
Example  7, 1:  Log  of  Exports  to  Latin  America 
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I)  DV:  Leave-2-Out  after  Two  Rounds  of  Iterative  Deletion 
Example  7, 1 :  Log  of  Exports  to  Latin  America 

Figure  9 


with  the  influential  data  points  treated  as  missing  provides  a  useful  graphical  display  to  be 
compared  with  the  raw  residual  plot. 

Example  72  :  Value  of  Unfilled  Orders,  Radio  and  TV  (UNFTV) 

Figure  10a  displays  the  monthly  value  (in  millions  of  dollars)  of  unfilled  orders  for 
radios  and  televisions  (UNFTV)  from  1958  to  1981.  This  series  was  previously  studied  by 
Martin,  Samarov,  and  Vandaele  (1983),  who  used  a  robust  ACM  Filter  to  fit  an 
ARIMA(0,l,l)x(0,l,l)i2-  The  series  was  also  analyzed  by  Engle  and  Kraft  (1983),  who  fit 
an  ARCH  model  the  data.  Our  initial  fit  is  an  ARIMA(0,l,l)x(0,l,l)i2;  the  MLE’s  are  given 
in  Table  2  and  the  residuals  from  the  MLE  fit  are  given  in  Figure  10b. 

Leave-k-out  diagnostics  for  k  =  1 ,3 ,5  are  displayed  in  Figures  10c-  10c.  The  diagnos¬ 
tics  reveal  a  gross  outlier  at  9/78  (see  Figure  10c),  which  apparently  masks  the  influence  of 
the  other  neighboring  points:  Figures  lOd  and  lOe  clearly  indicate  the  presence  of  other 
influence  observations  at  the  end  of  the  series.  Following  the  iterative  deletion  strategy,  we 
would  remove  the  outlier  at  9/78  and  recompute  the  diagnostics.  However,  examination  of 
Figure  10b  shows  that  the  end  of  series  has  many  more  large  residuals  than  the  rest  of  the 
series.  It  seems  quite  likely  that  a  variance  change  may  have  occurred  towards  the  end  of  the 
series.  So  instead  of  following  the  usual  procedure,  we  adopt  the  flexible  approach  and  look 
for  the  possibility  of  a  variance  shift 

Using  the  "bottom  up"  (and  computationally  expensive)  approach  described  in  Sec¬ 
tion  6,  we  perform  leave-k-out  diagnostics  for  k  =  16,32 ,64.  The  diagnostics  for  k  =  64  are 
displayed  in  Figure  lOf,  and  dramatically  support  the  conjecture  of  non-homogeneity  of  vari¬ 
ance  present  in  the  series:  the  maximum  value  for  DV  is  over  250  (note  that  this  plot  has  a 
different  scale  from  Figures  10c- lOe).  The  diagnostics  for  k  =  16  and  k  =32  (not  shown) 
display  a  similar  pattern,  although  achieving  a  smaller  maximum  value.  It  is  clear  that  the 
behavior  of  this  series  is  fundamentally  different  towards  the  end  of  the  data.  Thus,  following 
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step  3  of  Section  7,  the  data  is  split  into  two  senes,  and  each  pan  is  analyzed  separately.  We 
chose  1/76  as  the  change  point,  based  on  the  residuals  plot  and  on  computation  of  DV  for  a 
few  judiciously  selected  subsets.  Specifically,  patches  of  increasing  size  were  truncated  from 
end  the  data,  and  the  data  was  split  (approximately)  according  to  patch  of  maximum 
influence. 

Checking  the  model  order  for  the  first  part  of  the  series  again  yielded  an 
ARIMA(0,l,l)x(0,l,l)i2  model.  The  MLE’s  for  this  model  are  given  in  Table  2.  The  resi¬ 
duals  to  this  fit  are  given  in  Figure  lOg.  Note  the  reduction  in  the  estimated  innovations  vari¬ 
ance  from  3123  to  1303.  Leave-k-out  diagnostics  for  k  =2,4,8,  displayed  in  Figures  lOh- 
lOj,  reveal  several  patches  of  influence.  Two  patches  are  espcially  prominent:  one  during 
1968  and  another  in  1972.  The  patch  in  1972,  which  shows  up  only  in  the  leave-8-out  diag¬ 
nostics,  is  associated  with  an  obvious  level  shift  spanning  from  5/72- 10/74  (see  Figure  10a). 
The  patch  in  1968  corresponds  to  a  large  residual  at  6/68,  and  has  a  less  well  defined  struc¬ 
ture.  A  local  level  shift  is  present  during  11/67-5/68  or  during  6/68-10/69,  or  both.  In  any 
case,  the  diagnostics  help  us  identify  problem  areas  in  the  data,  and  show  that  the  large  resi¬ 
dual  is  associated  with  a  patch  of  influential  points  rather  than  an  isolated  outlier. 

A  different  model  was  selected  for  the  latter  section  of  the  scries: 
ARIMA(0,l,l)x(0,0,2)$  was  fit,  and  the  MLE’s  displayed  in  Table  2.  The  residuals  are  plot¬ 
ted  in  Figure  10k  (with  the  abscissa  rescaled  appropriately);  the  circled  points  are  eventually 
deleted  from  the  series.  The  leave- 1-out  diagnostics,  displayed  in  Figure  10/,  reveal  the  iso¬ 
lated  outlier  at  9/78  that  showed  up  prominently  in  the  original  set  of  diagnostics  (see 
Figure  10c).  However,  a  surprising  feature  pops  up  in  the  leave-k-out  diagnostics  for 
k  =  2, 3, 4 ,  given  in  Figures  lOm-lOo.  A  highly  influential  patch  at  2/78-5/78  is  discovered; 
leave-k-out  for  k  =5  (not  shown)  reveals  no  further  significance.  It  is  important  to  note  that 
this  patch  does  not  correspond  to  any  unusually  large  residuals  (see  Figure  10k). 


Following  the  iterative  deletion  strategy,  we  remove  the  patch  2/78-5/78  and  recompute 
the  MLE’s,  which  are  given  in  Table  2.  The  estimated  innovations  variance  drops  from  8960 
to  5340,  and  the  estimated  coefficients  change  dramatically,  having  a  root  on  the  unit  circle. 
Recomputing  the  diagnostics  (not  shown)  reveals  the  influential  point  at  9/78.  Note  that 
9/78  was  significant  in  the  previous  round,  and  the  smearing  was  consistent  with  an  isolated 
outlier  (see  Figures  ll/-llo).  Removal  of  9/78  further  reduces  the  estimated  innovations 
variance,  but  has  a  relatively  small  effect  on  the  estimated  coefficients  (see  Table  2).  No 
more  influential  points  are  uncovered  with  a  second  round  of  iterative  deletion. 


Table  2: 

Parameters  Fit  to  UNFTV  Data 

Time 

Period 

Model 

Step 

«, 

*1 

©2 

a2 

Points 

Deleted 

1958-80 

(0,U)x(0,l,l)12 

— 

.41 

.75 

— 

3123 

— 

1958-75 

(0,l,l)x(0,l,l)i2 

— 

.18 

.92 

— 

1303 

— 

(0,l,l)x(0,0,2)6 

0 

.36 

-.25 

-.51 

8960 

— 

1976-80 

(0,1,1)x(0,0,2)6 

l 

.49 

-.46 

-1.00 

5340 

2/78-5/78 

(0,1,1)x(0,0,2)6 

2 

.36 

-.46 

-1.00 

4173 

9/78 

In  summary,  the  UNFTV  series  clearly  reveals  the  importance  of  leave-k-out  diagnos¬ 
tics  embedded  in  a  good  overall  strategy!  This  approach  effectively  detects  the  major 
modeling  difficulties  present  in  the  data.  A  single  ARIMA  model  is  inadequate  to  represent 
the  entire  series:  the  latter  part  of  the  data  appears  to  behave  according  to  a  different  model 
Also,  the  first  part  of  the  series  is  subject  to  several  local  disturbances,  which  could  be 
modeled  as  level  shifts  using  intervention  analysis.  Finally,  several  very  influential  obser.  a 
dons  are  present  in  the  latter  pan  of  the  data,  affecting  the  estimated  coefnc.e-'  ■ 


dramatically.  Obviously,  any  forecasts  made  will  depend  highly  on  how  these  points  are 
treated. 

With  regard  to  dealing  with  the  variance  shift  in  the  last  part  of  the  series,  the  ARCH 
modeling  approach  of  Engle  and  Kraft  (1983)  appears  to  be  a  viable  alternative. 


c)  Scaled  Leave-1 -Out  Diagnostics:  Innovations  Variance 
Example  7,2:  Unfilled  TV  Orders 

Figure  10 


d)  Scaled  Leave-3-Out  Diagnostics:  Innovations  Variance 


g)  Plot  of  Residuals:  1958-1975 


h)  Scaled  Leave-2-0ut  Diagnostics:  Innovations  Variance 


i)  Scaled  Leave-4-Out  Diagnostics:  Innovations  Variance 
Example  7.2 :  Unfilled  TV  Orders 

Figure  10 


j)  Scaled  Leave-8-Out  Diagnostics:  Innovations  Variance 


I)  Scaled  Leave-1 -Out  Diagnostics:  Innovations  Vanance 
Example  7,2:  Unfilled  TV  Orders 

Figure  10 


m)  Scaled  Leave-2-Out  Diagnostics:  Innovations  Variance 


n)  Scaled  Leave-3-Out  Diagnostics:  Innovations  Variance 


o)  Scaled  Leave-4-Out  Diagnostics:  Innovations  Variance 
Example  7,2  :  Unfilled  TV  Orders 

Figure  10 
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8.  Discussion 

8.1  Extensions:  Other  Measures  of  Influence 

We  have  introduced  two  measures  of  empirical  influence  in  this  paper,  based  on  the 
estimated  coefficients  and  on  the  estimated  innovations  variance.  While  the  former  is  more 
well  established  in  the  ordinary  regression  context,  the  latter  was  found  to  be  more  effective 
at  identifying  outliers  and  other  types  of  influential  observations.  Other  notions  of  influence 
may  also  be  useful;  several  possibilities  are  discussed  below. 

Diagnostics  Based  on  a  Robust  Scale  Measure 

One  attractive  extension,  suggested  by  V.  Yohai,  consists  of  computing  a  robust  scale 
measure  of  the  residuals  computed  from  the  leave-k-out  fit  To  see  why  that  would  be  use¬ 
ful,  recall  that  an  outlier  will  cause  a  large  value  of  DV  for  two  reasons:  first,  the  outlier 
inflates  the  variance  because  of  an  associated  large  residual,  and  second,  it  inflates  the  vari¬ 
ance  by  distorting  the  parameter  estimates,  and  hence  the  fit.  A  robust  measure  of  the  scale 
of  the  residuals  is  resistant  to  outliers,  and  hence  would  reflect  only  the  second  feature. 
Thus,  we  obtain  measure  of  how  the  fit  alone  is  influenced  by  a  patch  of  observations  by 
computing  the  change  in  the  robust  scale  for  the  leave-k-out  residuals. 

Diagnostics  for  Forcasting 

An  important  area  for  application  of  subset  deletion  based  diagnostics  is  in  forecasting. 
One  possibility  is  to  measure  the  influence  of  a  set  of  observations  on  the  forecast  distribu¬ 
tion  by  computing  the  change  in  the  location  and  spread  of  the  forecast  confidence  intervals 
when  subsets  of  points  are  deleted.  It  is  expected  that  the  most  influential  points  for  forecast¬ 
ing  will  occur  near  the  end  of  the  senes.  Interactive  graphical  displays  of  the  change  in  the 
confidence  intervals  will  be  particularly  useful  for  determining  the  extent  to  whicn  suer. 
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influence  exists. 

Another  possible  measure  of  influence  is  to  determine  the  change  in  forecast  mean- 
square-error  when  points  are  deleted. 

Diagnostics  Based  on  Additive  Noise  Variance 

We  have  demonstrated  that  the  influence  of  an  outlier  in  an  ARIMA  model  is  better 
measured  by  a  diagnostic  based  on  estimates  of  the  innovations  variance  rather  than  a 
diagnostic  based  on  the  estimated  coefficients.  This  raises  the  question  whether  or  not  a 
useful  diagnostic  might  be  based  on  estimates  of  hypothesized  additive  noise  variance.  The 
state  space  formulation  of  Section  2  easily  generalizes  to  include  additive  noise  in  the 
measurement  equation  (see  Jones,  19S0).  Let  be  a  sequence  of  normally  distributed 
independent  random  variables  with  mean  0  and  variance  h  a 2  .  Then  the  measurement 
equation  for  an  ARIMA  process  observed  with  additive  noise  is 

x,  =z'x, +  £,  (9.1) 

The  prediction  and  update  equations  remain  the  same,  except  that  the  observation-prediction 
residual  variance  is  given  by 

ft  =  z'pr|r-iz  +  *  •  (9-2) 

The  maximum  likelihood  estimate  of  h  is  obtained  by  inclusion  of  a  parameter  for  h  in 
the  non-linear  optimization  of  (2.10). 

It  is  possible  that  inclusion  of  additive  noise  in  the  model  will  yield  an  even  sharper 
diagnostic  for  outliers,  and  may  be  helpful  in  identifying  certain  model  changes  (e.g.,  level 
shifts).  However,  there  is  one  caveat:  Estimation  of  an  additive  noise  variance  h  c2  when  in 
fact  h  a2  =  0  can  lead  to  seriously  inflated  variances  for  the  other  parameter  estimates  (see 
Section  V  of  Martin,  1980). 
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8.2  Computational  Considerations 

Software  for  computation  of  maximum  likelihood  estimates  for  ARIMA  models  with 
missing  data  is  now  widely  available.  Implementation  of  leave-k-out  diagnostics  involves  a 
simple  extension  of  existing  programs  or  statistical  packages.  The  complexity  of  leave-k-out 
diagnostics  is  n  x  j  non-linear  optimizations,  where  n  is  the  sample  size  and  j  is  the 
number  of  different  k  used  for  leave-k-out  diagnostics.  However,  the  computations  are  not 
as  severe  as  it  might  appear,  since  the  optimizations  can  be  run  with  good  starting  values  and 
an  estimate  of  the  hessian  obtained  from  the  fit  with  all  the  data. 

The  computational  burden  can  be  considerably  reduced  by  iterating  only  one  or  two 
times  in  the  optimization  procedure  (see  Storer  and  Crowley,  1985).  We  have  found  this 
gives  virtually  identical  results  in  many  cases:  See  Figure  11  which  reproduces  the  initial 
leave-one-out  diagnostics  for  Example  7.1  (Figure  9d),  and  superimposes  the  diagnostics 
obtained  when  just  one  iteration  is  allowed 

Some  run  times  in  minutes  for  Examples  3.1  and  7.1  are  given  in  Table  5.  These  com¬ 
putations  were  carried  out  on  a  68020  based  Masscomp  MC5600  at  the  University  of  Wash¬ 
ington.  Restricting  the  optimizer  to  just  one  iteration  reduces  the  computations  by  factors  of 
roughly  two  and  three  in  the  two  examples.  Further  improvements  in  speed  are  possible  by 
computing  analytical  first  and  second  derivatives. 

The  use  of  interactive  graphics  to  mark  subsets  for  deletion  was  mentioned  as  a  tool  for 
use  when  the  iterative  deletion  procedure  breaks  down  (see  Section  6.2).  Interactive  subset 
deletions  can  also  provide  computational  savings  for  those  situations  in  which  as  much  infor¬ 
mation  can  be  gleaned  from  the  data  computing  DV  for  a  few  select  subsets  as  for  the  entire 
data  set  Furthermore,  implementation  of  such  a  procedure  is  relatively  straightforward 
using  current  technology.  We  are  developing  implementions  for  UNIX  t  workstations 


t  UNIX  is  a  trademark  of  AT&T  Bell  Laboratories. 
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a)  Scaled  Leave-1 -Out  Diagnostics:  Full  and  1  Iterations  Allowed 


Example  8.1  (continued):  Log  of  Exports  to  Latin  America 
Diagnostics  Computed  Allowing  Full  Iterations  and  1  Iteration 

Figure  1 1 
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running  the  interactive  statistical  language  and  system  S  (Becker  and  Chanbers,  1984),  and 
for  Symbolics  t  workstations. 


— 

Leave-k-out 

No.  of 
Iterations 
Allowed 

Example  3. 1 

n  =  100 

AR(1 ) 

k  =  1 

1 

full 

CPU  Time 
in 

minutes:  seconds 


Example  7.1 
n  =  216 
AR1MA(0,1,2) 


3:50 

10:01 

3:50 

10:06 


8.3  Scaling  of  Diagnostics 

The  diagnostic  proposed  in  (3.5)  for  DC  is  not  very  satisfactory  when  the  estimated 
coefficients  are  near  the  boundaries  of  nonstationarity  or  noninvertibility.  This  is  because 
1(a),  the  expected  information  matrix,  becomes  singular  in  this  case,  and  inflates  DC.  The 
resulting  diagnostic  is  no  longer  comparable  to  a  x2  distribution,  and  is  too  heavily 
weighted  by  the  coefficients  responsible  for  the  singularity.  One  solution  to  this  problem  is 

*  A 

to  scale  by  HP  ( a )  /  n ,  the  observed  information,  which  hopefully  gives  a  better  estimate  ot 
the  covariance  matrix  for  a  (Efron  and  Hinckley,  1978,  prefer  observed  information  in  the 
case  with  independent  observations).  Computation  of  'P  ( a )  / n  is  more  difficult,  though  it 
can  be  done  explicitly.  Kohn  and  Ansley  (1986)  give  a  state  space  formulation  of  the 
$  Symbolics  is  a  trademark  of  Symbolics,  Inc. 


ARJMA  model  which  includes  a  computation  of  the  first  and  second  derivatives. 

We  have  not  yet  addressed  the  issue  of  "internal"  versus  "external"  scaling  of  the  diag¬ 
nostics  (see  Cook  and  Weisberg,  1982).  Internal  scaling  uses  the  same  norm  for  all  observa¬ 
tions,  while  external  scaling  uses  a  different  norm  for  each  observation,  with  each  norm 
based  on  the  data  excluding  the  observation.  For  simplicity,  we  have  chosen  to  use  an  inter¬ 
nal  norm  in  (3.5)  for  DC  .  However,  we  could  easily  compute  the  externally  scaled  diagnos¬ 
tic 

n  (a-dkt  (a)(a-akJ)  (8.3) 

where  it  ,  (a)  is  the  estimated  information  matrix  for  a*  ,  .  Two  possible  estimates  are 
i*  ,  (a)  =  I(at  , )  or  it  ,  (a)=  xFti,  (a*,,  )/(n-k  ).  As  in  the  case  of  ordinary  regres¬ 
sion  analysis,  internal  scaling  tends  to  obscure  influential  points,  since  an  outlier  will  often 
inflate  the  variance  of  a ,  and  hence  will  decrease  I(  a ).  Thus,  external  scaling  is  preferable 
for  DC ,  and  should  be  used  if  possible. 

For  scaling  the  diagnostic  DV  based  on  the  innovations  (or  observational)  variance,  it 
only  makes  sense  to  use  an  external  norm.  This  is  because  an  outlier  virtually  always 
mcreases  the  estimated  variance,  so  that  the  ratio  ak  ,  /  a  is  usually  less  than  one  for  outly¬ 
ing  observations.  Thus,  if  in  (3.7)  we  scaled  by  the  internal  norm  a4,  instead  of  the  external 
norm  6k  J  ,  then  DV  (k  ,  • )  is  often  not  significant  when  evaluated  for  an  outlier. 

8.4  Diagnostics  for  the  Beginning  of  the  Series 

In  this  paper,  we  have  dealt  with  the  beginning  of  the  series  by  reversing  the  series  so 
that  the  beginning  becomes  the  end.  The  need  for  this  arises  because  the  Harvey  and  Pierse 
(1984)  algorithm  for  missing  data,  which  we  have  used,  does  not  handle  missing  data  at  the 
beginning  of  the  series.  On  the  other  hand,  the  method  of  Kohn  and  Ansley  (1986)  does 
allow  missing  values  in  the  beginning  of  the  series,  and  thus  with  their  method,  a  complete 
set  of  (leave-k-out)  diagnostics  can  be  computed  with  one  pass  over  the  data.  We  hope  to 


implement  the  Kohn  and  Ansley  method  in  the  near  future. 


8.5  Diagnostics  Versus  Robust  Filter  and  Smoother  Cleaners 

The  results  produced  by  the  leave-k-out  diagnostics  are  similar  to  the  prediction  resi¬ 
dual  diagnostics  obtained  from  robust  model  fitting  based  on  robust  filter-cleaners  or  robust 
smoother-cleaners  (see  Martin,  1979,  1981;  Martin  and  Thompson,  1982;  Martin,  Samarov 
and  Van  Daele,  1983).  The  robust  model  fitting  diagnostics  consist  of  looking  for  large 
values  of  the  observation  prediction  residuals  e,  =  yt-yt\t- 1»  where  denotes  the  one- 

step  ahead  robust  prediction  based  on  filter  or  smoother  cleaned  data. 

There  is  in  fact  a  close  connection  between  the  the  prediction  residuals  produced  by 
leave-k-out  diagnostics  and  those  produced  by  a  hard  rejection  type  filter-cleaner.  Using  the 
notation  of  Section  2,  the  hard  rejection  filter  is  defined  by  replacing  the  recursions  for  x, 
and  Pf  in  (2.6)  by 


*r  =  • 

r 

w 

+/,*'*.  i 

i-l  zvt 

if  |  et  |  £c 

if  |  e,  |  >  c 

(8.4) 

P,  =  ' 

* 

Pf|f-1 

Pr|/-1 

1 

> 

i 

►— » 

|f-l»'P,„_1 

if  |  et  | 

if  1  «,  1  >  c 

(8.5) 

where  c  is  some  threshold  value  (for  hard  rejection  filtering,  c  ~  2.6  works  well;  see  Martin 
and  Su,  1985).  One  way  of  looking  at  (8.4)  is  that  a  data  value  y,  corresponding  to  predic¬ 
tion  residual  larger  in  absolute  value  than  c  produces  the  same  result  as  the  Kalman  filter 
with  y,  treated  as  missing.  Thus,  if  the  iterative  deletion  procedure  of  Section  6.1  identifies 
the  same  points  as  those  which  are  rejected  by  the  filter  of  (8.4),  then  the  prediction  residua.'' 
of  the  two  procedures  will  be  identical  if  the  model  parameter  values  are  the  same.  Tire 
latter  will  be  approximately  true  at  the  completion  of  the  overall  strategy.  Hence. 


diagnostics  obtained  from  the  leave-k-out  procedure  will  closely  match  those  resulting  from 
a  robust  procedure  based  on  the  hard  rejection  filter. 

8.6  Related  Work 

Other  approaches  to  the  problems  of  outliers  and  structural  disturbances  in  time  series 
have  been  explored  in  the  literature.  An  approach  proposed  by  Chang  and  Tiao  (1982),  Hill- 
mer.  Bell  and  Tiao  (1983),  and  Tsay  (1986)  is  based  on  iterative  fitting  of  ARJMA  models, 
utilizing  Fox  (1972)  tests  to  decide  whether  an  individual  observation  is  an  IO,  AO,  or  not 
and  outlier.  The  approach  is  easily  extended  to  cover  shifts  in  level  and  shifts  in  variance. 

Another  important  direction  for  dealing  with  model  changes  of  various  types  has  been 
pursued  by  Harrison  and  Stevens  (1976)  and  Smith  and  West  (1983),  who  use  a  Bayesian 
approach.  A  mixture  of  normals  is  used  to  automatically  adapt  the  model  to  outliers  and 
other  local  structures.  West  (1986)  and  West,  Harrison,  and  Migon  (1986)  propose  a  some¬ 
what  different  method  based  on  Bayes  factors,  in  which  a  nominal  model  is  compared  to  a 
“neutral”  alternative. 
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APPENDIX  A:  Computation  of  Asymptotic  Information  Matrix 

This  section  derives  an  analytical  expression  for  the  asymptotic  information  matrix  of  a  station¬ 
ary  and  invertible  ARMA  (p  ,q )  process  $(B  )x,  =  Q(B  )et .  The  formulas  are  easily  extended  to 
nonstadonary  and  seasonal  models.  Let  g  j ,  •  ■  •  ,gp  and  h  j ,  ■  •  ■ ,  be  the  roots  of  the  polyno¬ 
mials  $(B  )  and  8(B ) ,  so  that 

4>(B)  =  (l-glB)(l-g2B)-(l-gpB) 

0(B  )  =  (l-  h  i  B  )(l  -  h  2B )  ■  ■  •  (1  -hxB)  (A1) 

Assume  that  the  roots  are  distinct:  gt  *  gj  and  h^h]  for  i*j  . 

Let  c,  and  di  be  the  coefficients  in  the  expansion  of  <j>(B )~ 1  and  0(B  )-1  respectively;  i.e., 

4>-\B )  =  £  c,  B  ‘ ,  0_1(B  )=  £  dtBl  (A. 2) 

i=0  1  =  0 

The  asymptotic  information  matrix  1(a)  is  given  by 

h,j  =  Z  ckck+j-i  if  IZiZjZp 

k-  0 

h,P+j  =  Z  <*k ck+j-i  if  1 

Jk=0 

-  (A. 3) 

h,p*j  =  Z  ck^k+j-l  if  l£i£p,  l£j£q,  j^i 

k=0 

f p+i,p+j  ~  Z  ^k^k+j-i  if  1 

k=0 

The  coefficients  can  be  computed  recursively  from  the  relation 
1  =  4>(B  MB  )_1  =  0(B  )0(B  )_1 ,  or 

<j>(B)cf=0  Q(B)d,  =0  r=l,2,  •  ■  •  (A. 4) 

Initial  conditions  for  the  recursions  are  c0=l,c_o+1='  ■  •  =  c  =  0  and 

<i0-  1  ,  d_p+ 1  =  •  •=£/_ i  =  0.  Hence,  (A. 3)  provides  and  explicit  expression  for  1(a) . 


S*  V  s  *» 
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By  expressing  (A. 3)  in  terms  of  the  roots  $(5  )  and  0(5  ) ,  the  formulae  can  be  reduced  to  a 
summation  of  a  finite  number  of  terms.  For  r  =  1,2,  •  •  •  ,  a  solution  to  (A. 4)  is  given  by  (see  Box 
Jenkins  [1976]) 


ct  ~  k  1 8  l  +  kp  gp 

d,  =  / 1  h  i  +  •  •  •  +  lq  hlq  (  ’ 

where  k  ; ,  ■  ■  ■  ,kp  and  /  j ,  •  •  •  , lq  are  (possibly  complex  valued)  constants.  Since  c,  and  d. 
can  be  evaluated  recursively,  (A. 5)  defines  a  system  of  linear  equations  which  can  be  solved  for 
k  t ,  •  •  •  ,  kp  and  / 1 ,  ■  ■  ■  ,lq.  Substituting  (A.5)  into  (A. 3),  Fubini’s  theorem  yields 

I i.j  =  £  (  £  kmgmH)(  £kn8n) 

1  =  0  m  =  1  n=l 

=  ii  {k^ngf  £  (  8m  8n  /  ) 

m  =  \  n=l  t=  0 


ifis; 

m=l  n  =  l 


(A. 6a) 


Similarly, 


it.,*  =  X  X  (‘.'.sho-sAi'1)  >f  is; 

m  =  l  n=l 


(A. 6b) 


=  I  I  in** 


m  =  l  n=l 


(A.6c) 


I  £  ^ i  s; 


m  =1  n  =  1 


(A.6dl 


-v-fOy-jv,.  •f-'  -'-'  • 


^ 

•*  - 


SI 


APPENDIX  B:  Computation  of  EDC 


7  6 


The  expected  asymptotic  diagnostics  for  coefficients,  EDC ,  are  computed  for  the  leave-one-out 
diagnostic  in  the  AR(1)  model.  First,  we  compute  A(r0;  0),  the  difference  between  score  function 
for  the  entire  data  and  the  score  function  for  the  data  with  r0  treated  as  missing  for  a  perfectly 
observed  Gaussian  process.  Then,  for  the  AO  and  10  contamination  models  studied  in  Section  4, 
(4.6)  is  evaluated  by  breaking  up  the  computations  into  two  parts,  one  corresponding  to  the  outlier 
free  process,  and  the  other  corresponding  to  the  contamination. 

Let  x,  be  a  perfecdy  observed  Gaussian  process.  For  the  AR(1)  process,  the  score  function  for 

0  is 

*<0)  =  -7  I  ft/ ft  -  TT  I  [2 €tet/ft-et2ft/ft2]  (B.l) 

1  t=\  2c  ,=\ 


where  et  is  the  prediction  residual,  C  2/,  is  the  variance  of  the  prediction  residual,  et  = 


de, 

30 


30 


and  ft  -  ~~  With  no  missing  data,  we  have  e,  =  x,  -0xf_j  and  /,  =  1  for  t  >  1  .  If  xlr 


is  missing,  the  score  function  is  similar  to  (B.l),  except  that  we  drop  the  r0  term  in  the  summations 
and  adjust  the  r0  -+  1  prediction  and  residual.  Define  e*a  =  0  and  /,0  =  1 ,  and  denote  the  predic¬ 
tion  residuals  and  variances  for  when  x,o  is  missing  by  e*  and  C J2/,  If  t0>  1  ,  then 


«r*0+l  =  *i,+l-*2jcf,-l  /*o+i  =  (1+<>2)  (B-2) 

and  e *  =  e,  ,  f*  =  f,  for  t  *r0>  *0  +  1  •  The  score  function  V,o)(0),  with  x,Q  missing,  is 
given  by  (B.l)  with  e,+  and  f*  substituted  for  et  and  f, . 

Hence  for  Iq  >  1 


A,  fro:*)  =  '^'°>(0)-M'n(0) 


/, 


2  o' 


(2^:+1  -e*\xf*+xlft 


* 

0  +  1 


e.  e, 

*  0  ‘  0 


+  e, 


ft . 


0 _ 1_  [  2(^0-O:^8-i)(-2<&xto_i)  ^  (^0^!  -Q2xf0_1)-(2o) 

1  +02  2  a 2  1  +  02  ( 1  +  02)2 


-  ~T  Uo-^o-I^o-t  +  ^'o+l 


1+0-  a- 


<5  1  ,  ,  x  ,  203  -2<D(  1  +o2) 

— T T  xiayxta~\+xt0  +  \)^xt0-\xt^\  — 


d+0": 


2  203(  l  +  gr)-4>5  -( l+d>2)20 

f0_1  d+02)2 


-<Ko  ~xt]  + 


{ l  +rr 


= - - V  -<jw.2  +x.  (x.  +x.  .  , )  -  - - ^  ■  —  (  x.  _ ,  +x,  .  ,  )2  (B.3) 

1+02  a2  [  f0  '°  f°  1  '0+1'  (1+-4)2)2  1  '0  +  1'  J 

which  yields  (4. 10). 

We  are  now  set  to  compute  EDC  for  the  AR(1)  model.  Let  yt  be  contaminated  according  to 
an  AO  or  IO  outlier  model.  It  is  convenient  to  break  up  the  computation  of  EDC  into  two  parts, 
that  corresponding  to  xt  and  that  corresponding  to  the  contamination.  Define  C  $  =  E  A"  ( 1 0;  0) 
where  A(  r0;  4>)  is  the  difference  in  the  score  functions  for  x,  ,  and  is  given  by  (4. 10).  Denote,  for 
now,  the  difference  in  the  score  functions  for  yt  by  Ay(r ;<(>),  and  let  5v(r;0) 
=  Ayi(r;  <t>)-A(r;  <p) .  Then 

£  Ay( ( r ;  4>)2  =  C0  +  £  SJ( ( r ;  <p)2  (B.4) 

since  the  cross  product  terms  vanish.  From  (B.4),  we  can  compute  EDC  by  equation  (4.6). 


The  tedious  part  of  (B.4)  is  computing  C  which  is  shown  below  to  be  equal  to 


( l  +o-)- 


Calculating  £  5y(r;  <$>)  is  easier,  bur.  must  be  done  case  by  case.  These  are  computed  below 
several  outlier  configurations. 


*  V’jy v  *  •*,  V** * -S’”***'- . 


AO  Models 

Consider  first  y,  which  obeys  the  model  given  in  (4.1 1):  a  single  AO  type  outlier  of  size  £  at 
r0.  Paralleling  the  notation  of  Section  4,  denote  Sy, ( f o;  ♦)  by  5(£to)(  roi  0)  •  We  obtain 
8(^0)('o;  0)  by  subtracting  (4.10)  from  (4.12),  which  leads  to 

£(5(1;?ro)(fo;0)2)  =  ^-£U(xfo_l+acIo+1-2<|urlo)-0;2]2 

•  ^4  r  ^2 

=  -2-  02  +  -y  E[4  02x,2  -  40x,o(x,  ,  +x,0+I)  +  (xl0_1  +x,0+1)2] 

c  la  J  0 

•  "i  4  r  ^2 

=  -a-  02  +  -a-  — ^-T(402-802+2(l+02)) 

0  0  1—0" 

+  J  / 

-  "j4  r  ^2 

=  -5-  02  +  -S-  2  (B.5) 

a  o 


Upon  adding  C*  and  scaling  by  (l-02),weget  EDC$lo)(t0‘,  0),  which  is  displayed  in  (4.13). 
Under  the  same  outlier  model,  except  leaving  t0  + 1  (or  t0  - 1 )  out,  we  get 


£(8(fIo,(ro+l;0)2)  =  "C2 


=  — ^77+  U-l  — 1 - ~yy ( 2 0) 4-  (2(l+02)) 

a  (1+02)4  0  1-02  (  1+02)2  (  1+02)4 


402 

(1+02)4 


.  A  *-*U+  i  '-4  i~K 


o  a+rr  o  i-0"  d+0") 


which  leads  to  (4.14). 


For  AO  type  outliers  of  size  £  at  both  f  o  -  1  and  t  o  +  1 , 


£511-1. -^n('o;0)2  =  — y  £  -(20 

a 


(i+rr 


,  ,  .J .  7  ( 20  *l#  2  ,  ,  (x(0_,+x,0+1) 


(i+04r 


=  £6,1gfo)(fO-l;0)2 
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which  is  given  by  (B. 6)  above.  Hence  8(^.1 >/o+1)(r0;  0)  =  EDC  ( ro  ~  1  )• 


10  Models: 

Now  suppose  y,  is  observed  with  an  10  type  outJier  of  size  £  at  r0  and  thus  behaves  accord- 
ing  to  (4.18).  We  obtain  an  expression  for  §/^r0)(ro)  by  subtracung  (4.10)  from  (4.18),  so  that 


£5,'?,o,« =  -t£  -»  +  » 


d+<?2)2 


+  ;  x,o(-20  +  0)  +  (x,o_1+x,o  +  1)  1  ~  TV  2 

1(1+9) 


=  1  — $fi—4-  £  — 1—  02-20 

O  ( l  -m)>2)4  |oJ  l-4>2  I  (l+<t>2)2 


+  1 - 2(  1 +02) 

(1+4>2)2J  l 


l4  .6  f  r  l2 


=  - ~T+  ^  ~ l~T  <t>2-4*2  +  — ^Vl+2(l+02) 

o  (1+02)4  l<*J  1  -02  (1+02)2 


(i+92)  (i+rf 


l4  .6  fr  l2 


S  - £ - +  Jl  — 1 —  2-02  1  -■ 

<*  (1+02)4  M  1  -02 


k2  /  i  ,  a2  > 


+  02)2  +  8  o' 


(1+02)3 


=  1 


)4  ,6  f r  l2 


■+  i  -1 


o  i  d+^r  J  i-<> 


2-0  1  + 


o+rr 


From  this  follows  the  expression  for  EDC  (;,/„)  (f  o)  •  displayed  in  (4.20). 

Evaluating  the  diagnostic  at  / 0  -t- z  ,  we  obtain  8^,oj(  f  q +  0  from  (4.10)  by  replace; 
x,0_l  with  xto..+0‘  *'s.  x,8  with  x,o  +  0‘ ;  and  x,oM  with  x,#+  ,  +  0‘  “  1  s  m  (4  1G 


This  yields 


E  5/?  ,  ,  i  r0  -*-i  ) 


r  /• 

:  =  k  -o-' * = *o;- 


- *VT(0‘ 

(  1  +0‘  )‘ 


+  ^  Xf<)(  -  2  0  1  +0  +0  *)+(*, 0_l+*r„*l>  ^  ~  {Q  H 


=  JL  02i-2  0 - ^-r-r-f<b2+n2 


(i+rr 


(r+ir 


a  1-0- 


-£-  — L^02i“2  (1-02)2  +  2(l-02)  0 - (20) 


d+r) 


+  0 - (  2  ( 1  +  02 )) 


(1+0") 


.  I}2 -L. 02-2  [( 1  -02)2 -  41>2(  1  =££  +  tfu  =P- 

a  i-02  (l+<t>2)  (l+<>2) 

=  JL  02i-2  ( 1  ( l  +  02-402+202) 


v- . 

a  (1+02) 


.  i  *i-2il=£L 

a  (1+02) 


This  leads  to  (4.21). 

Compulation  of  C  § 

It  remains  only  to  compute  C $  s  £  A f|)( 0;  xf  )2 .  Since  the  vector  (xt  _  i ,  xt ,  xt  +  j )  has  a 
multivariate  normal  distribution,  we  can  easily  compute  the  following  useful  expectations. 


E  [  xti  (xt  - 1  +xt  + 1 

E  [  (*»  -  l  +jCt  + 1  >2  ]  =  2  E  fo2*i  -  1  +Jcr  - 1  x?xt  + 1  ] 

_4 

=  2  (  1  +  5  02 )  -  2  - 

(1-02)2 

E[xt{xt_  1  +Xf^;)3]  =  IE  [x(X,3_,  +  3  X,2_  I  x{  X,  +  1  ] 


—  *♦ 

=  2[  30  +  30(l+202)]  - — 

(l-0‘)* 


(B.io: 
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=  124>(l+02)  °  ,  z 

a-r)2 


E  [(^r-l+^r  +  i)4]  =  E  [  2(xlA_i  +  4 x?_ ! + 1 )  +  6 jc,2_  j xt\  t  ] 


=  [2  ( 3  +  12  <j>2)  +  6(  1  +2  <Jf )] 


4\1  O 


(1-02)2 


=  12(1-+-  4>2  )2  — ^~yy 
(1-02)2 

We  are  now  set  to  evaluate  the  expectation  of  the  second  term  (squared)  in  (4.10). 


E  0 Xf“  4--c^(arf_j+jrf  +  j)  _~(att_j  +  arj  +  [) 

(i+r) 


2  62 

=  E  <J>2*,4 -20*, 3  {xt_x  +*,  +  1)  +  - 7T  +  1  x*  (x‘ - 1  +x'  + 1  )2 

(i  +<rr 


=  3  02  -  12  02  +  2  -2V,-  +  1  ( 1  +  5  <)>2 ) 

Ici+r)2 


.24-^rT(1+,2)+i^n±^i  -2^ 

(1  +  02)2  (l+O2)4  ( l  -  4»2  )2 


=  |(  3  02  -  1202  +  2+  1O02)  + 


2\.  _ 1 _ /  a  a2  a2  .  i  ,v2  *  in  a4  a»4  \  _ SL 


d+rr 


( 4  -  24  ^  +  12  r  +  20  04  -  24  04 ) 


o-o-r 


=  [( 2  +  02)  (  1  +  02)2  - 8 02  -  404 ] 


(l  +  02)2(l-02)2 


=  ( 2  -  3  02  +  06 ) 


(l+02)2(l-02)2 


(B.ll) 


Since  E  A,  (r0;  0)  =  0, 


£  0*,  +*f  ( -**  -  i  +  +  i  )  ,  (xf_;+xt  +  |)  ,  (Bl_ 

(l+0‘)  (1+0  ) 


r.V^r/.V 


-.v.v.  .•-V.>.  ^2<,-%--V.\-_v2^.v_v2’^.v2d--'vv'^2v'2v-2A:‘v>y2^2/2v-v2/'2w.>2v_v.-rfv.>2^1v2^„v.v^'.-.  ^.v-v.vO.-  j>.  ./ 


yy 
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APPENDIX  C:  Computation  of  EDV 

The  computations  for  EDV  parallel  that  for  EDC  (see  Appendix  C),  though  we  must 


compute  the  score  function  for  a  .  By  (4.7),  we  need  only  compute 


A(  t  o ;  C  2 )  =  VF^  (a  2 )  -  *¥n  (a 2 ),  which  is  the  difference  between  the  score  function  for  O  2 


with  r0  considered  as  missing  and  the  score  function  with  all  the  data  for  a  perfecdy  observed 
Gaussian  process. 

The  score  function  with  no  missing  data  is 


^(a2)  =  -4 


n  1 


1 


2  a2  2a* 


I  vr/f, 


(C.l) 


If  xlo  is  missing,  then  the  prediction  errors  and  the  variances  are  given  by  v,*  and/,*  in  (B.3). 
Hence,  for  the  leave-one-out  case 


A(r0;a2)  =  —r  +  [  v,£,//,J+1  “  ( vr*+i//<o+i  +  vf2,//r0)] 

2a  2a 


(C.2) 


-3_  +  _ L_ 

2a2  2a4 


( 1  +<t>2) 


-((^0  +  l-4«ro)2  +(xr0-K,-l)2) 


=  2^+27[‘(l+^2)^  +2<^0(*/0-l+*r0  +  l) 


_ 2£ 


(1  +  <I>2) 


(1+<J>2) 


-  1 


r2 

^tn+l 


r 

x4 

9  A2 

r2 

l<l+f2>  J 

X»o-l 

1  +  1 


2a2  2a‘ 


"(l  +  <t>2)*,2  +20xfo(xf#_1+xf#  +  i) 


(l+<r) 


which  is  the  same  as  (4.15). 

Assume  x,  is  a  perfectly  observed  Gaussian  process,  and  y,  behaves  according  to  some 


outlier  model.  Proceeding  as  in  AppendixC,  define  Dai  -  E(A(t  0;  a  ‘ )‘ ) 


an.: 
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5y(  (^o ;  ®  ~ )  "  Ay,  (r  g  ;  ct“  )  -  A(r0 ;  O  ~ )  where  Av(rg;c“)  is  the  difference  in  the  score  func 


tions  for  y(  .  Then 


£(Ay,(;0;  a2)2)  =  Dai  +  E 8y  (t 0:  a2 )2 


As  shown  below,  Dai  =  - 7  ,  and  E(8y  (t0  ;  C2)2)  is  evaluated  case  by  case  below.  From 

2  a 


(C. 3)  we  can  compute  EDV  via  (4.9). 


AO  Models: 


We  first  consider  an  AO  type  outlier  of  size  £  at  t0.  We  can  compute  Sy(r0;a2)2  in  the 
same  fashion  as  Appendix  B:  replace  xlg  by  xf(j  +  £  in  (C.2)  and  subtract  off  A(r0;a2).  Using 


the  same  notation  as  before, 


£5<f. .)('<>; °2)2  »  -Ai-£[;2(-d+4>2))+  C(-2(i+«2)x„+241uIj.1+x,<M))]2 

'  34  f  r  l2 

=  „  -TT(l  +<p2)2+  ~  l+4>2)2- 16<{>2(  l  +  <}>2)  +  8A2(  l  +o:)) 

oj  4 a4  [a  4a4  (l-4>2) 


,  JL  _L 


o  4a' 


(i+<n: 


2+  -^-(1-Hj>2) 


a  a 


Adding  Dai  to  (C. 4)  and  scaling  by  2  a4  gives  (4.16). 


The  calculations  for  when  r  g  -  1  (or  fg  +  1 )  we  left  out  proceed  as  follows: 


?2  [TTtW  +  (;  24“'-'2 


- = —  ( JCf  _t+JC.  .  ) 

(l+<t>2)  t0  1  f0  ■ 


=  JL  _J _ _ L 


a  J  4a  (l+<{r)  o  j  4a  (!—<{>) 


4o2-16  +  8 


(  1  +0")  (  1  +  0-  ) 


!  1  o4  c  *  j _ 


o  4a4  ( l+o2)2  I  o  a4  ( l+<$>2) 


This  leads  to  (4. 17). 
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•  *  -  -  P  A  -P 


For  outliers  of  size  £,  at  and  r0  +  ^-  *£  *s  easy  10  see  (as  in  (C.4))  that 

£  5(^,0- i,r0  +  l)^fO;  °2)  =  E  5(2;;r0/rO“  1;  O  2 ) ,  which  is  given  by  (C.5). 


/O  Models  ■ 


Proceeding  in  the  usual  way,  (4.22)  is  derived  from 


E8'°to)(t0-,a2)2  =  -J-g-E  C2  -( 1  +  02)  +  202  - 


k2  x  ^  _  _  0* 


+  C  (-2(1+4>2)  +  202)*,o+  2  +  (xr#_i+*r#+l) 


l4  ,  f  .4  I2 
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=  A.  -1-  (4?_i) — 4V  +  i 
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i_+  A  J - ! — 
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.  (t.  -  -2rr4n  ■»  fnr  f  s  f*  cin/*^ 


Also,  EDV^.to)  (r ;  a  )  =  2a  Z}ai  for  t  >  r0  since 

£5(^fo)(r0  +  i;a2)2  =  ( -( l  +  02)02i  +2(  1  +  02)02i -( 1  +  02)<t>2i ) 

+  ^(-2(l  +  02)0i+2(l+02)0i)xf  +  (20i+1-20i  +  1)(.*.  1+^ 


Computation  of  Dai 

Straightforward  algebra,  using  the  expectations  given  in  (B.10),  shows  that 


E  -(  1  +02)x(2  +  2  0x,  (*,  +x,  +  1) - j-(xf  +x,  +  1)2 
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Since  £  Ay  ( r ;  a2 )  =  0 ,  then 
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Hence, 
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